Shang's Research Flourishes Under the Right Guidance, Sustains Illinois CS Success in Data Mining

10/1/2020 Aaron Seidlitz, Illinois CS

Former Ph.D. student earned 2020 ACM SIGKDD Dissertation Award Runner-Up, credits adviser and Michael Aiken Chair Jiawei Han with inspiring productive research.

Written by Aaron Seidlitz, Illinois CS

Jingbo Shang
Jingbo Shang

From the time Jingbo Shang started the Illinois Computer Science Ph.D. to milestone moments – like Tripadvisor using his research and open-source code to improve search results – one part of the experience, more than any other, provided him an opportunity to grow.

He said that was the way Jiawei Han, his adviser, guided his studies. Following years of academic and industry-collaboration experience, Han has a firm grasp of what it takes to produce great research and impactful tools the industry giants want to use. Due to this depth of knowledge, Han could dictate to students the right path forward.

According to Shang, though, Han does the opposite.

Instead, he encouraged Shang to investigate several his own ideas through the first two years of the Ph.D. program. That allowed inspiration to strike with an idea that became his dissertation: “Constructing and Mining Structured Heterogeneous Information Networks from Massive Text Corpora.

Illinois CS Success in Data Mining Research

  • 2008: Xiaoxin Yin, SIGKDD Dissertation Award
  • 2009: Hong Cheng, SIGKDD Dissertation Award Honorable Mentioning
  • 2011: Tianyi Wu, SIGKDD Dissertation Award Runner-Up
  • 2013: Yizhou Sun, SIGKDD Dissertation Award
  • 2015: Chi Wang, SIGKDD Dissertation Award
  • 2018: Xiang Ren, 2018 SIGKDD Dissertation Award
  • 2019: Chao Zhang, SIGKDD Dissertation Award Runner-Up
  • 2020: Jingbo Shang, SIGKDD Dissertation Award Runner-Up

Shang’s resulting research earned the 2020 ACM SIGKDD Dissertation Award Runner-Up, sustaining recent success Illinois CS Ph.D. students achieved in data mining.

“To me this dissertation award does mean a lot, because it represents being one of the two top young data mining researchers in the world this year,” said Shang, who started an appointment this Fall with University of California, San Diego as an assistant professor with Computer Science Engineering Department and HalıcıoÄŸlu Data Science Institute. “I want to be a game-changer in this area. Our technology means that people won’t spend too much time on data annotation and curation, or at least we can serve as their first step before they do their annotation.

“We will save tremendous human effort and speed up the AI revolution.”

Shang accepted this research topic as a “grand challenge” in data mining, designed to cut through the overwhelming amount of natural language text data.

This work accounts for sources ranging from news articles to social media posts, medical records to corporate reports. Shang’s methods produced information networks from which user need generates actionable knowledge.

“During our early discussions, I spoke to Jingbo a lot to find a specific research direction – and he immediately did great pursuing many important angles,” said Han, a Michael Aiken Chair. “Once he found the right problem, though, he did something even more important. He produced something that wasn’t only for publishing papers. He generated his own algorithm, his own software that many companies in the industry – Google, Microsoft, IBM, etc. – wanted to use.”

As they narrowed Shang’s focus, both shifted gears together to begin a full pursuit of the possibilities.

Jiawei Han
Jiawei Han

“It’s a lot like learning to drive. If you never sit in the driver seat, you will never learn how to drive,” Shang said. “Jiawei is like your driving coach; he sits next to you, encourages you, tells you that you can do it. But I’m the one driving, and there are some learning moments along the way.

“The first couple of attempts did not go as planned, but then I learned what I needed to and felt comfortable pursuing a daunting topic.”

As he became more comfortable behind the wheel of this research project, Shang recalled a few milestone moments that came to fruition.

The first moment he realized something important was on the horizon came upon receiving an email from Tripadvisor. The self-proclaimed “world’s largest travel platform,” loved the idea of using his technology to help users parse through hotel reviews.

The only issue, the company indicated, was that they served people all over the world – so they needed something that could include languages such as Spanish, French, Chinese, Japanese, Arabic, etc.

“This actually motivated us to do the AutoPhrase portion of our project, which we could see represented a huge potential need at many different companies,” Shang said.

Next up, he felt he had to focus on his writing.

He enrolled in three English as a Second Language courses while at the University of Illinois Urbana-Champaign. He also absorbed all that he could from Han’s writing revisions. Finally, Shang became confident enough to proof papers for more junior (e.g., first-year) Ph.D. students.

The effort he placed in his writing combined with the advances in his technology, led to his first accepted paper in a research area outside of data mining. That occurred when the Natural Language Processing (NLP) community accepted one of his papers for the EMNLP 2018 conference.

“We proposed a new state-of-the-art supervised Named Entity Recognition (NER) method,” Shang said. “I believed this could be impactful within the NLP community, perhaps even more impactful than the work we did in data mining. When we made the EMNLP 2018 conference, that was a springboard for me – as I now have a lot of NLP papers published in EMNLP and ACL.”

One final milestone taught Shang that industry possibilities could extend beyond what he anticipated.

That’s why he took a summer internship in 2018 with Two Sigma – an investment management company in New York that believes in a scientific approach to investing.

“That internship gave me confidence that my technique could help make money in the stock market,” Shang said. “By using AutoNet for information extraction, I learned that we can reveal certain signals in trading. I also learned that even in the finance domain, there is an application for my work.”

Having steered his research project and dissertation to a successful outcome, Shang fulfilled the potential Han recognized in him years ago.

The adviser remembers first taking notice of Shang during the student’s undergraduate years. As he pursued a bachelor’s in Computer Science from Shanghai Jiao Tong University, Shang was also co-coach of SJTU’s ACM International Collegiate Programming Contest (ICPC) team all four years.

“The first thing about Jingbo, from the time he was an undergraduate student, he has demonstrated super intelligence,” Han said. “Beyond his intelligence, he also played a leadership role in augmenting small research teams involving graduate and even some undergraduate students. That became very popular with our students, because sometimes they have a hard time finding that research guidance they need.

“Jingbo showed how very kind and responsible he is through his actions with others.”

Share this story

This story was published October 1, 2020.