Mining diagnostic sequences and comparative analysis for contagious diseases has long been overlooked. COVID-19 reaching pandemic status changed that.
For years, computational biologists have utilized comparative analysis to study contagious diseases. This approach shows similarities and differences in these diseases, helping the scientific and medical communities understand better ways to diagnose, cure and treat patients.
The hard part was getting others to listen.
But that all changed in March, as these same communities came to grips with the reality of the COVID-19 pandemic. Suddenly this type of work found an eager audience.
Aware of this shift in perception, a project formed among Illinois Computer Science Department Head and Abel Bliss Professor of Engineering Nancy M. Amato, Illinois CS professor Lawrence Rauchwerger and Rice University assistant professor Todd Treangen.
These three researchers launched a project that received funding from the C3.ai Digital Transformation Institute. It's called “Mining Diagnostics Sequences for SARS-CoV-2 Using Variation-Aware, Graph-Based Learning Approaches Applied to SARS-CoV-1, SARS-CoV-2, and MERS Datasets.”
“This started due to significant increases in data available as attention to COVID-19 increased. One of the things that particularly excites me about this project is that we can now press further and study how this virus evolves,” Amato said. “We will be harnessing tools such as parallel processing and machine AI to analyze all this data. That should then give people the tools they need to actually understand what’s going on.”
Amato’s interest builds off her own, as well as Treangen’s, long-established expertise in computational biology.
Together, she believes, this workgroup can produce a study that delves into areas that others haven’t broached yet.
“Current approaches have focused primarily on ‘interhost’ differences – achieved by comparing across COVID-19 positive patients – and on datasets that involve thousands to tens of thousands of genomes,” Amato said. “We are going to be developing novel bioinformatics algorithms that can use parallel processing to scale up to more than 100,000 genomes and offer a deep dive into ‘intrahost’ differences or variants.”
This opportunity ahead, due to the increase in genomes available to analyze, also presents a challenge.
According to Rauchwerger, the workgroup will draw upon this much larger dataset in a way that allows for realistic application.
“You have to consider that we don’t really know what makes this virus change. Then it becomes clear we have a humongous amount of computation ahead to simulate how it will change within one person,” Rauchwerger said. “And then when you consider there more than 100,000 genomes to analyze, it’s clear we have to computationally add some smartness into this process. We can at least then prune the parts that aren’t significant to uncover the parts that need further investigation.”
Meanwhile, Treangen will continue his focus on comparative analysis and COVID-19 diagnostics.
The Rice University professor stated that initial studies provided limited glimpses into the genomic similarities and differences between SARS-CoV-2 and other deadly coronaviruses, such as SARS-CoV-1 and MERS. By comparing it with SARS-CoV-1 and MERS, Treangen hopes to pinpoint how a genome with so few differences compared to the others can “wreak such havoc across the world.”
He also wants to continue analyzing the diagnostics behind COVID-19 first by ensuring that the testing is as accurate as needed.
Treangen is also aware that people might question why there is a need to improve testing when mass testing already exists. He pointed out that the virus can change at any point, because viruses thrive off a host and do what they must to survive. Now it’s worth understanding further as scientists and clinicians strive toward better therapies and a vaccine.
“People may assume the timing of this point is off since we are already doing such widespread testing. Why introduce more testing now? Well, the point focuses on ensuring our processes are sound as we develop therapies,” Treangen said.
The final piece to this effort is something that Rauchwerger called “a bit pie-in-the-sky.”
Still, the opportunity to apply prediction to the spread of COVID-19 fascinates him. Rauchwerger wants to start by using this project to better understand how the virus is reacting compared to other, similar coronaviruses. Then he will consider the way it is reacting both across positive patients and within positive patients. This, Rauchwerger believes, will provide enough knowledge to begin predicting how the disease will change next.
“We also think about, and I fell in love with the notion of, prediction,” he said. “If we can predict the way things are going to change, we could at least begin narrowing possibilities for biologists.
“This could produce an interactive simulation to prove whether certain results were possible, helping the community understand what this disease might do next.”