Bioinformatics & Computational Biology

Bioinformatics interfaces computer science and molecular biology with the goal of storing, organizing and analyzing biological information. Research includes developing efficient and scalable algorithms for biomolecular simulation and applying data mining, statistical machine learning, natural language processing, and information retrieval to analyze and mine all kinds of biological data, including DNA sequences, protein sequences and structures, microarray data, and biology literature, for the purpose of facilitating biology discovery.

Illinois researchers focus problems pertaining to the phenomenon of “gene regulation” and its evolution. Gene regulation refers to how genes in a cell are switched on (or off) to determine the cell’s functions. It is the reason why, for example, skin and muscle cells are different despite having the same DNA. It is central to a range of biological phenomena such as development and disease. Moreover, evolution of gene regulation underlies the amazing diversity of life forms around us.

Learn more about Bioinformatics research at Illinois:


  • Comprehensive maps of gene regulation in various organisms
  • Gene regulation and social behavior
  • Cis-regulatory modules and their discovery through comparative genomics
  • Evolution of modules
  • Models of regulatory function
  • Probabilistic Alignment
  • Biology literature access and mining



Computational tools developed by Illinois researchers are available for download and free use by the academic community. 

  • GEMSTAT (Thermodynamics-based modeling of gene expression from regulatory sequences)
  • SWAN (Prediction of binding targets of a transcription factor, characterized by a position weight matrix)
    This Linux-based program is meant for genome-wide prediction of regulatory targets of a motif using a Hidden Markov Model. It differs from Stubb in that instead of asking Does the sequence have more sites than expected from a random (background) model of sequences?, it asks the question Does the sequence have more sites than the average genome-wide frequency of sites? We have found this new approach to lead to more accurate motif target predictions overall.
  • EMMA (Prediction and alignment of cis-regulatory modules)
    This Linux-based program is meant for prediction of regulatory targets of a motif using two-species comparison. If you have a sequence window of length ~100 bp - 2000 bp, and its orthologous window from another species, use EMMA to score the window for matches to a given motif. EMMA is also useful for alignment of cis-regulatory modules (enhancers) between two species, if you have knowledge of the relevant transcription factor motifs.
  • GenomeSurveyor (Prediction of motif targets in D. melanogaster)
    This web-based Genome Browser allows you to find regulatory targets of a large collection of transcription factors in the Drosophila genome. You may use cross-species comparison among 12 genomes to see conserved targets.
  • Morph software (Probabilistic alignment of cis-regulatory modules) 
  • CRM discovery benchmark (Data sets from D. melanogaster.) 
  • D2Z software (Alignment free comparison of regulatory sequences.) 
  • Indelign software (Probabilistically annotating indels in multiple alignments) 
  • DIPS software (For finding discriminative PWM motifs) 
  • Stubb software (For finding cis-regulatory modules) 
  • PhyME software (Motif finding in orthologous sequences) 
  • YMF software YMF Web Server (Motif finding)

Lab Locations

  • 2113 Siebel Center (Data Mining & Bioinformatics)
  • 19 Animal Science Lab (Bioinformatics Systems)
  • 330 Edward R. Madigan Laboratory (Biotechnology Center)


Bioinformatics Application Courses

ANSC 542/CPSC 569/IB 506 Applied Bioinformatics
ANSC 545/CPSC 545/IB 507 Statistical Genomics
CHBE 571/MCB 571/STAT 530 Bioinformatics
CPSC 567 Bioinformatics & Systems Biology
CPSC 558 Quantitative Plant Breeding
CPSC 565 Perl & UNIX for Bioinformatics
CHEM 574 Genomics, Proteomics, Bioinformatics
EPSY 589 Categorical Data in Ed/Psyc
LIS 590BDI Biodiversity Informatics

Bioinformatics Foundation Courses

CS 511 Advanced Database Systems
CS 512 Data Mining Principles
CS 545 Systems Modeling & Simulation
CS 558 Topics in Numerical Analysis
CS 573 Topics in Algorithms
CS 578 Information Theory
CPSC 540 Applied Statistical Methods II
CPSC 541 Regression Analysis
MATH 580/CS 571 Combinatorial Mathematics
STAT 510 Mathematical Statistics I
STAT 525 Computational Statistics
STAT 542 Statistical Learning
STAT 563 Information Theory
STAT 571 Multivariate Analysis
STAT 587 Hierarchical Linear Models
PSYC 594 Multivar Analysis in Psych and Ed
EPSY 582 Advanced Statistical Methods
EPSY 580 Statistical Inference in Educ
EPSY 587 Hierarchical Linear Models
EPSY 588 Covar Struct and Factor Models
LIS 590DC Foundations of Data Curation



Saurabh Sinha gene regulation, comparative genomics, sequence analysis
Kevin C. Chang data mining, database systems, machine learning, information retrieval, web search/mining, social media analytics
Jiawei Han data mining
ChengXiang Zhai information retrieval, text mining, bioinformatics
Bruce Schatz bioinformatics


Bioinformatics & Computational Biology Centers & Labs