Bioinformatics and Computational Biology

Our researchers work on core computational biology-related problems, including genomics, proteomics, metagenomics, and phylogenomics. We develop novel techniques that combine ideas from mathematics, computer science, probability, statistics, and physics, and we help identify and formalize computational challenges in the biological domain, while experimentally validating novel hypotheses generated by our analyses.

We are developing algorithms with improved accuracy for large-scale and complex estimation problems in phylogenomics (genome-scale phylogeny estimation), multiple sequence alignment, and metagenomics. We are exploring gene regulation—developing advanced techniques to predict the diverse function of noncoding parts of DNA and to relate interspecies and interpersonal differences in DNA to differences in the organism’s form and function. We work broadly in the development of machine learning techniques for computational biology, with research spanning the areas of molecular and structural biology; networks and systems biology; and molecular mechanisms of human disease.

Strengths and Impact

Awards

Recent highlights from the last five years include several honors and awards to our BCB faculty, including the Sloan Research Fellowship and Chris Overton Prize (highest award given by the International Society for Computational Biology (ISCB) to a junior researcher) to Jian Peng; election as AAAS Fellow, ACM Fellow, and ISCB Fellow, and award of the Grainger Distinguished Chair in Engineering to Tandy Warnow, CAREER award to Mohammed El-Kebir; and award of the Founder Professor in Engineering to Saurabh Sinha. The BCB group has particular strengths in regulatory genomics (the focus of Saurabh Sinha); phylogenetics and multiple sequence alignment (the focus of Warnow and also of interest to Jian Peng); and cancer genomics (the main focus of Mohammed El-Kebir and also of interest to Jian Peng).

Research Breakthroughs

Mohammed El-Kebir has made important advances in the theoretical foundations of cancer phylogenetics and developed methods for the estimation of cancer phylogenies from sequencing data of tumors. Recent breakthroughs include PhyDOSE (Weber et al., PLOS Computational Biology 2020), a method to design cost-effective single-cell DNA sequencing experiments of tumors, and RECAP (Christensen et al., ECCB 2020), a method to detect repeated patterns of tumor evolution in cancer patient cohort data.

Another strength in the group is protein structure and function prediction, which is the main focus of Jian Peng. His method, DeepContact (Liu et al. Cell Systems 2018), which is based on deep learning, has been a strong performer in the bi-annual community-wide Critical Assessment of protein Structure Prediction (CASP) competition, and became the foundation for the recent breakthrough in this field. His group also developed a few popular function prediction algorithms, including Mashup for network integration (Cho et al. Cell Systems 2016), DTINet for protein-drug interaction prediction (Luo et al. Nature Communications 2017) and TransposeNet for studying gene function in neurodegenerative diseases (Khurana et al. Cell Systems 2017).

Regulatory genomics refers to the study of gene regulation and Saurabh Sinha’s group develops computational techniques for this field. His group recently showed how multi-omics data can be analyzed through a probabilistic model to reveal key regulators of colorectal cancer progression (Ghaffari et al. Genome Biology 2021). Another study from the group developed a state-of-the-art simulator for single-cell expression data based on given gene regulatory networks (Dibaeinia & Sinha, Cell Systems 2020).

Phylogenomics, which is the estimation of evolutionary histories, is another important strength in the group, and the main focus of Tandy Warnow. Recent breakthroughs include MAGUS (Smirnov and Warnow, Bioinformatics 2020), a method for large-scale multiple sequence alignment that is more accurate than the previous best methods (PASTA and UPP), and the first proofs of statistical consistency (Legried et al., RECOMB 2020 and Molloy & Warnow, ISMB 2020) for methods for species tree estimation that address gene duplication and loss.

PhD Placements

The BCB group has been fully committed to the placement of its PhD students and postdocs in top academic and research positions in the United States and abroad. Erin Molloy from the Warnow’s group has accepted a faculty position in the CS department at the University of Maryland starting Fall 2021. From the Sinha group, Jaebum Kim is now a faculty member at the Konkuk University in Korea, Xin He is an assistant professor at the University of Chicago, Jin Tae Kwak is a faculty member at the Korea University, Majid Kazemian is an assistant professor at the Purdue University, Md. Abul Hassan Samee is an assistant professor at Baylor College of Medicine, and Amin Emad is an assistant professor at McGill University. Sheng Wang from Peng’s group is now an assistant professor at the University of Washington at Seattle.

Research Efforts and Groups

Seminars

Illinois Computer Science Speaker Series: brings prominent leaders and experts to campus to share their ideas and promote conversations about important challenges and topics in the discipline.

Faculty & Affiliate Faculty

Nancy M. Amato

Modeling Molecular Motions, Protein Folding, Protein/Ligand Binding

Mohammed El-Kebir

Bioinformatics, Cancer Genomics, Cancer Phylogenetics, Phylodynamics, Phylogeography, Information Visualization

William Gropp

Parallel Algorithms, Genomics, Computational Phylogenetics, High-Performance Computing

Jiawei Han

Mining Biological Text, Biological Named Entity and Relation Extraction

Ravi Iyer, Electrical & Computer Engineering

Individualized Medicine, Health Data Analytics, Probabilistic, Graphical Models, Multi-omics, Neuroscience, Pharmacogenomics

Ge Liu

Bioinformatics and Computational Biology

Hongye Liu

Applied machine learning methods in Bioinformatics; Algorithm development for big data analysis; Data visualization; Integrative bio-medical data analysis; Single cell dynamic biological data analysis; High throughput genomic profiling data analysis.

Olgica Milenkovic, Electrical & Computer Engineering

Compressive Genomics, Information Theory

Jian Peng

Bioinformatics, Protein Function and Structure, Systems Biology, Machine Learning and Optimization

Saurabh Sinha

Bioinformatics, Genomics, Modeling, Sequence Analysis, Machine Learning, Probabilistic Methods, Cancer, Behavior

Brad Solomon

Computational Genomics

Jimeng Sun

Deep Learning for Drug Discovery, Molecule Property Prediction and Generation, Genomic and Phenotypic Modeling

Tandy Warnow

Graph Algorithms, Statistical Estimation, Heuristics for NP-Hard Optimization Problems, Phylogenomics, Metagenomics, Multiple Sequence Alignment