Two CS PhD Students Win IBM Fellowships

3/22/2010 Jennifer LaMontagne, Computer Science and Jenny Applequist, Information Trust Institute

Jing Gao honored for work in knowledge integration, and Eric Rozier for work to create large-scale rare-event simulations.

Written by Jennifer LaMontagne, Computer Science and Jenny Applequist, Information Trust Institute

Two Illinois computer science Ph.D. students are recipients of IBM Fellowships for their innovative research.  Jing Gao was honored for her work in data mining and knowledge integration, and Eric W.D. Rozier was honored for his work to create large-scale rare-event simulations.

IBM Ph.D. Fellowships are awarded based on a competitive worldwide program. According to IBM, the program “honors exceptional Ph.D. students who have an interest in solving problems that are important to IBM and fundamental to innovation in many academic disciplines and areas of study.” Fellows are awarded a stipend and educational allowance to cover one academic year of study.

“Illinois computer science students consistently demonstrate themselves to be among the best in the world,” said Rob A. Rutenbar, head of the computer science department and Abel Bliss Professor of Engineering.  “Jing and Eric are excellent examples of the kind of innovative research that our students are engaged in to solve some of the most interesting computing challenges of today. “

Jing Gao
As the saying goes, “two heads are better than one.”  This axiomatic principle forms the guideline and central theme for Jing Gao’s research in knowledge integration.  Her work draws from data mining and machine learning to create knowledge integration among multiple information sources.  Her work combines multiple base models, known as ensembles, for better predictions.

While there is an abundance of data sources available to users today, there is lots of risk in relying on any single model for decision making, in terms of making the best choice to maximize your benefits.  To reduce this risk and make accurate learning predictions, combining sources offers an attractive method.

“But, the data is so huge, that you can’t really combine it at the data level,” cautions Jing.  “It’s only feasible to do at the model level.  One of the key trends in information technology is for increasingly large and disparate sources of data, and that’s why it’s so important that we get this [knowledge integration] right.”

Knowledge integration holds the key for intelligent decision making in many emerging applications, and has already shown its power in multiple disciplines, including recommendation systems (like the Netflix $1,000,000 Prize), anomaly detection, stream mining, and web applications. 

Jing plans to extend the scope of her ensemble methods to several other data mining functions, including ranking, anomaly detection, and veracity analysis.  She also plans to investigate knowledge synthesizing over multiple heterogeneous information networks, as the impact of Web 2.0 technologies continues to expand the amount and type of information available.

Jing works with Prof. Jiawei Han in the Database and Information Systems Laboratory.  She received her B.E. and M.E. in Computer Science from the Harbin Institute of Technology in Harbin, China.  In 2009, Jing did an internship with IBM T.J. Watson Research Center where she designed a supervised discriminative pattern mining algorithm in conjunction with the IBM InfoSphere Warehouse Intelligent Miner.

Eric Rozier
Eric W. D. Rozier, who conducts his research as part of the Information Trust Institute at Illinois, is pursuing research on simulation of rare events in large-scale systems. Such simulation addresses systems in which important events occur at a variety of vastly different rates, such that the fastest and slowest rates in the system may differ by orders of magnitude.

Those systems are challenging to simulate, because in order to model system behavior over a span of time long enough to include an adequate number of rare event occurrences, it may be necessary to model an immense number of frequently occurring events. Since the computation time involved in solving a system model is generally determined by the number of events fired, systems with rare events typically result in “stiff” models that are too large and complex to solve. The research challenge is to find a way to make solution of such systems mathematically tractable.

Storage systems, an application area in which Rozier has been collaborating with IBM researchers, are a notable real-world example of systems with rare events. In addition to their “normal” faults, such as hard disk failures, storage systems are affected by potentially devastating, but exceedingly rare, events called “undetected disk errors.” In those errors, the disk writes to the wrong sector, or says it’s written to the right sector when it actually wrote nothing.

“These are very hard to catch,” says Rozier, “and they can cause massive problems. And they become more relevant with systems like Blue Waters. Because in large petascale systems, these rare events that before you might not expect to see for maybe 50 years of runtime, now will occur roughly every 100 days.”

Rozier earned a bachelor’s degree in Computer Science from the College of William and Mary in 2003, and since 2004 has been a graduate student in the research group of Prof. William H. Sanders at Illinois. In 2008 and 2009 he did two research internships at the IBM Almaden Research Center in San Jose, California. He is also a key member of the development team for Möbius (www.mobius.illinois.edu), a simulation and modeling tool licensed to over 400 academic and industry users.


Share this story

This story was published March 22, 2010.