OMSCS Student Uses Machine Learning to Help Understand Covid-19

Tuesday, May 5, 2020

College of Computing OMSCS

With dozens of research papers about Covid-19 being published each week, it can be difficult for doctors and scientists to read the most important studies.

A student at Georgia Tech, however, is using artificial intelligence (AI) techniques like natural language processing and machine learning (ML) to narrow down the most relevant information in this growing data set.

Kenneth Miller, a student in Georgia Tech’s Online Master of Science in Computer Science (OMSCS), is using these tools to develop algorithms to ensure that the most important Covid-19 research reaches doctors. His work is part of an ongoing challenge to use ML to empower the medical community to find the best Covid-19 studies.

Information Overload

The challenge started when Kaggle, a Google data science and ML community, partnered with the White House and several leading research groups to create the Covid-19 Open Research Dataset (CORD-19). With more than 47,000 scholarly articles about Covid-19 and other coronaviruses, it’s one of the most comprehensive research databases for the pandemic.

To sift through the data, Kaggle released CORD-19 to its community and asked them to use it to answer some of the toughest research questions about COVID-19. As incentive, for every task completed successfully, participants like Miller receive $1,000 in prize money.

As an OMSCS student specializing in ML, Miller has joined a few previous Kaggle challenges, but for much less significant tasks like home values or NCAA brackets. For Miller, working on this dataset presented an especially relevant problem.

“I am fascinated with everything AI, so when I heard about this, I figured if any of my skills could help anyone, I should try,” said Miller, who is a lawyer outside of his studies.

Keep it Simple

Miller said his OMSCS studies prepared him for the challenge. The AI track focuses on the practical implementation of AI methods. This made it easier for Miller to start with an overwhelming amount of data and get to an endpoint that solves the problem. His experience using the programming language Python for class also enabled him to agilely work with the data.

Armed with this knowledge, Miller applied a strategy he uses on every project.

“Whenever I start a new project, I try and see if I can craft a simple yet effective solution from scratch,” he said.

He has worked on specific Kaggle challenges he can apply this strategy. The first ML model Miller developed finds the most relevant sentences in a study. To accomplish this, he used a simple scoring algorithm that determines how many times keywords appear in a sentence. Then the model measures the ratio of keyword occurrences to sentence length.

For a separate challenge, Miller created a search engine for common Covid-19 research questions, such as: What is the average time the disease takes to incubate? How long is it contagious? How long until symptoms appear?

Up to the Challenge

These are just a few of Miller’s models, and he continues to work on new challenges Kaggle offers. Tasks now include deep dives into epidemiology, understanding how many patients a study was based on, and what scientific method was employed.

“The trick, as in any project like this, is understanding and assimilating the data to start with,” Miller said. “But using Python makes the initial data wrangling pretty easy. The hardest part is building new ways to squeeze more desired info out of the documents.”

Miller’s efforts have been noticed. His work has been cited several times on the contributions page.

For more coverage of Georgia Tech’s response to the coronavirus pandemic, please visit our Responding to Covid-19 page.

We are thrilled to announce Vivek Sarkar as the new Dean of the College of Computing at Georgia Tech! With a distinguished career spanning academia and industry, Sarkar's leadership promises to elevate our community to new heights. https://t.co/2mX5D46cJz pic.twitter.com/LxpLTCXWZV
— Georgia Tech Computing (@gtcomputing) April 12, 2024

@GeorgiaTech's dedication to excellence in computer science (CS) has been recognized once again, with the latest U.S. News and World Report rankings unveiling the institution at 7th place overall for graduate CS studies.https://t.co/qavNUSTb7n pic.twitter.com/BcGyGBQld8
— Georgia Tech Computing (@gtcomputing) April 10, 2024

College of Computing

Search

OMSCS Student Uses Machine Learning to Help Understand Covid-19

Tuesday, May 5, 2020

Tess Malone

College of Computing OMSCS

Recent Stories

Cybersecurity Researchers Explore How…

New Strategic Design Approach Focuses on…

Researchers Blazing New Trails with Superchip…

News Feed

Georgia Institute of Technology

OMSCS Student Uses Machine Learning to Help Understand Covid-19

Tuesday, May 5, 2020

Tess Malone

College of ComputingOMSCS

Recent Stories

Cybersecurity Researchers Explore How…

New Strategic Design Approach Focuses on…

Researchers Blazing New Trails with Superchip…

Georgia Institute of Technology

College of Computing OMSCS