For more information
Jian Peng has been a professor of computer science at UIUC since 2015. Before joining Illinois, Jian was a postdoc at CSAIL at MIT and a visiting scientist at the Whitehead Institute for Biomedical Research. He obtained his Ph.D. in Computer Science from Toyota Technological Institute at Chicago in 2013. His research interests include bioinformatics, cheminformatics, and machine learning. Algorithms developed by Jian and his co-workers were successful in several scientific challenges, including the Critical Assessment of Protein Structure Prediction (CASP) competitions and a few DREAM challenges on translational medicine and pharmacogenomics. Recently, Jian has received the Overton Prize, an NSF CAREER Award, a PhRMA Foundation Award, and an Alfred P. Sloan Research Fellowship.
- Associate Professor, Department of Computer Science
1. Computational Molecular and Systems Biology
Protein structure, interaction, and function are by nature intertwined, with structure, or structural properties, playing a large role in defining the function and understanding human diseases. As such, determining protein structure has been one of the most important challenges in biology. In parallel, given the many obstacles to experimental structure determination, computational prediction of protein structure remains one of the longest-standing challenges in computational biology. My group has developed a number of machine learning algorithms to tackle this problem. In particular, DeepContact, an approach that employs deep learning for protein contact prediction, has achieved substantial improvement over previous approaches, highlighted by its performance (as a co-winner) in the most recent Critical Assessment of protein Structure Prediction (CASP12) in 2016. In another recent work, we developed DeepSignal, a recurrent neural network model to study protein function, such as phosphorylation and motif binding. Applied to the cellular signaling system, DeepSignal not only improves the prediction but also enable mutational analysis for studying the functional variants in human diseases, such as cancer. We have then extended the model for other types of protein-substrate interactions, including protein-RNA binding and protein-peptide binding. Most recently, we developed a deep-learning model to integrate evolutionary representations for efficient protein design and engineering.
Moving from individual molecules, a systematic understanding of their interactions and functions, from heterogeneous datasets, is critical for both biological and translational medicine research. I have been developing efficient machine learning algorithms, especially integrative manifold learning algorithms, to extract information from various interactomic datasets. My algorithm, Mashup, takes full advantage of network-specific topology by learning a canonical representation that best integrates the topological patterns across multiple biological networks. Its substantial improvements over the state-of-the-art methods in distinct functional inference tasks demonstrate its applicability to effectively deciphering functional properties of genes from interactomes. Based on Mashup, my group has developed generalized versions to incorporate other data types, including text, homology, and structures
2. Pharmacogenomics and Human Disease Genomics
Computational prediction of drug–target interactions (DTIs) has become an important step in the drug discovery or repositioning process, aiming to identify putative new drugs or novel targets for existing drugs to accelerate drug discovery. Collaborating with biologists, my group has developed DTINet based on Mashup that integrates diverse drug information for DTI prediction. Validated on benchmarks and by experiments, DTINet offers a practical and accurate tool to predict unknown DTIs, which may provide new insights into drug discovery or repositioning. Furthermore, I have been working with doctors and medical researchers, applying my algorithms to studying human diseases. In a recent Cell Systems article, we systematically mapped molecular pathways underlying the toxicity of alpha-synuclein, a protein central to Parkinson’s disease. To translate findings from yeast screens, we developed a computational method to integrates a Steiner prize-collecting approach with homology assignment through Mashup integration with sequence, structure, and interaction topology. This work has been featured as the cover article in the issue and received multiple highlights in media and a Bishop Dr. Karl Golser Paper Award. Based on the network, we have performed a high-depth exome-capture of these genes in 500 patients with synucleinopathy and identified novel rare variants that are enriched in carriers of mutations in LRRK2 and GBA. Collaborating with researchers from UCSD School of Medicine, we have identified a novel way to stratify cancer patients by their tumor mutations using networks.
3. Data Analytics and Machine Learning
In addition to bioinformatics problems, we have been working on developing new machine learning algorithms for data analysis and deep reinforcement learning algorithms for process optimization of scientific experiments. Collaborating with chemists, we have been developing a fully automated molecular synthesis system powered by deep learning and reinforcement learning algorithms.
- NSF CAREER Award (2017)
- PhRMA Foundation Award in Informatics (2017)
- NCSA Faculty Fellowship (2016)
- Sloan Research Fellowship (2016)
- CS 466 - Introduction to Bioinformatics
- CS 598 - Machine Lrning Computation Bio
- STAT 361 - Prob & Stat for Computer Sci