skip to main content

CS and Life Itself: Warnow Authors Textbook on Computational Phylogenetics

3/14/2018 5:30:44 PM David Mercer, Illinois Computer Science

For close to 20 years, CS Professor Tandy Warnow has taught a class on algorithms and phylogenetic estimation from her own notes because there was no textbook on the subject.

Professor Tandy Warnow's first textbook, "Computational Phylogenetics," exposes CS students to algorithms for phylogenetic estimation.
Professor Tandy Warnow's first textbook, "Computational Phylogenetics," exposes CS students to algorithms for phylogenetic estimation.
That changed in December with the publication of Warnow’s new book, “Computational Phylogenetics: An Introduction to Designing Methods for Phylogeny Estimation.”

“Computational Phylogenetics,” her first textbook, is intended to expose graduate-level computer science students to the impact CS can have on phylogenetics—the inference of evolutionary histories and the genetic relationships between living things.

“Biologists compute phylogenies all the time. But as data sets have become larger and we’re using more data from across different species’ genomes, it’s becoming increasingly obvious that it’s a lot more complicated than people thought,” said Warnow, who is a Founder Professor of Engineering with Illinois Computer Science and associate department head.

“But it’s not just complicated in the sense that you need a better statistical model. You actually need better algorithmic designs to be able to analyze the data and get good accuracy, especially when you go to the large data sets,” she said.

How large?

“We’re trying to do evolutionary trees of a million species. That is an unbelievably big computational problem.”

Warnow’s book is being published at a point where computer science is having a deep impact across the sciences. CS is giving researchers the ability to untangle complex problems and comb through massive amounts of data in ways that have never before been possible.

As she writes in the book, phylogenies are a representation of the past, so they can’t be observed, only estimated. And traditionally biologists turned to statisticians for that work.

But high-level computation allows estimation on previously impossible scales that have the potential to unlock some fundamental questions about how life evolved.

Professor Tandy Warnow
Professor Tandy Warnow
“Statisticians have a specific set of tools that they use—computational tools, algorithmic tools and techniques. Those tools are fine on small datasets, (but) they don’t scale with respect to running time, nor with respect to accuracy,” Warnow said. “Computer scientists are great about designing methods that have the ability to provide accuracy on large datasets in reasonable time.”

The problems students will find in the book are pure computer science. Thumb through “Computational Phylogenetics” to a section on large-scale phylogeny estimation and you’ll quickly find yourself reading about NP-hard optimization problems.

“The problems that are in phylogeny are so clean, from a computer science standpoint, that you don’t have to know any biology to work on them,” Warnow said. “They don’t have any biology in them. They’re just natural problems.”

Warnow’s book is dedicated to her PhD advisor at the University of California at Berkeley, Professor Eugene Lawler. He introduced her to computational phylogenetics long before computer science was having such a profound impact on the subject.

“He was a very interesting guy,” she said. “Just a really warm-hearted person.”

From the point where she started gathering her notes with a book in mind, Warnow worked on the textbook for 10-plus years before it was published.

But the process rewarded her with more than just the book.

“I learned a lot,” she said. “I learned material because I had to include it in the textbook that I had not really tried to learn until I had to put it in the textbook.”

Now, she says, she’s considering writing another, this one aimed more at biology students. And she says she would like to write a textbook on another subject of her research, computational historical linguistics.

“It’s lot of the same computer science, a lot of the same math, but different types of data so slightly different statistical models, so slightly different theory that you end up establishing,” she said.