Wang Wins 2015 SIGKDD Dissertation Award

7/20/2016 3:49:00 PM Laura Schmitt, CS @ ILLINOIS

While data mining technologies are widely used to discover patterns in structured data, such as in databases, there’s a treasure trove of information hidden in unstructured data, such as event logs, product reviews, emails, and social media messages. Researchers have tried to uncover semantically rich structures from these sources of unstructured text with mixed results—that is until Chi Wang (PhD '14) cracked the code.

Chi Wang
Chi Wang
Chi Wang

Wang, who worked in CS Professor Jiawei Han’s group, developed methods for mining latent entity structures from massive unstructured and interconnected data, and he applied them to a variety of domains such as academic service, event log and news article explorer, and product review analytics.

In August, Wang received the 2015 SIGKDD Dissertation Award—a highly competitive honor in the data mining, machine learning, and data sciences areas—for his breakthrough research that produces quality topics, phrases, and relations with little or no supervision.

“This award motivates me to do even better work and make my techniques and methodologies more useful so people recognize the power of data mining,” he said. “It makes me feel that all the hard work at Illinois was worth it and it helped push my career.”

Wang’s approach was to model the unstructured data as text-rich heterogeneous information networks capable of handling text only (long articles or informal Tweets, for example), network-only (interconnected data), or text plus links. His work laid down a comprehensive mining framework of hierarchical topics surrounding entities, the roles these entities play, and the relations among entities.

According to Wang, his method enables you to analyze what topics a particular person (President Obama, for example) is involved in. “I can analyze news articles that mention Obama and provide a hierarchical view of all the different things like politics, national issues, international issues, or healthcare, for example, that he is involved in,” explained Wang. “Within each of these topics, I can further find significant subtopics and topical phrases so we can analyze in different granularity.”

A second functionality of Wang’s method is an ability to analyze the roles of entities in user-desired context and determine the role in a given topical community. Further, given this community and an entity type, his method can determine which entities play the most important roles in the community. “Looking at research publications in CS, we can then learn who the experts are for machine learning, theory, systems, and so on,” said Wang. “Again, my method can perform this analysis with a hierarchical view of different areas.”    

According to Wang, his method could also determine the relationship among entities. “For example, by only using research publication data, we can infer what the relationships are between researchers—who is whose advisor, who co-authored what, what areas they published in, when and how long they collaborated,” he explained.

“In summary, my mining approach can have entity structures from unstructured data and linked data, and it can transfer that information into rich, useful knowledge,” said Wang, who published a book, Mining Latent Entity Structures, with his advisor Han. “I appreciate Jiawei and all his support. He should get credit for the thesis award.”

Professor Han is proud of his student. “My students are very creative, and [Wang] has done really great research overall, publishing about 30 papers when he was here,” said Han. “He also earned a prestigious Microsoft Research PhD Fellowship—the first and only Illinois CS student to get this.”

Today, Wang is a researcher at Microsoft Research in Redmond, Washington, where he is developing algorithms to discover knowledge from unstructured data. These algorithms will be incorporated into products that enable enterprise users to manage data in an easy, interactive way.