Jiaxin Huang and Wing Lam have earned highly competitive doctoral fellowships in recognition of their innovative research.
Jiaxin Huang and Wing Lam have earned highly competitive doctoral fellowships in recognition of their innovative research. In addition to the honor, both fellowships provide generous financial support as the students complete their PhD degrees.
Huang, who works with CS Professor Jiawei Han, was one of 10 students from across North America to earn a 2021 Microsoft Research PhD Fellowship. She is exploring ways to mine structured knowledge from unstructured text data with minimal human supervision.
According to Huang, current text mining or natural language processing methods rely on large-scale, human-annotated data for model training. For example, the widely used OntoNotes 5.0 dataset contains 60,000 annotated sentences.
In addition, Huang said, a model trained on one domain of data—news, for example—cannot be used in other domains like financial or medical data.
“My research goal is to develop algorithms that only rely on very few human-given seeds and let the model learn from very large unannotated in-domain corpus to enhance its knowledge,” Huang said. “For example, the same type of entities can appear in similar collocations (local contexts), and we can design algorithms to measure this kind of similarity leveraging a large corpus.”
To date, Huang has developed methods in taxonomy construction and aspect-based sentiment analysis, which both use very weak supervision, or minimal annotated data.
“In my taxonomy construction project, with a small seed taxonomy given by a human, we can leverage a large corpus to recognize related entities, and use sentences including multiple entities to infer the potential relation between parent and child entities in the given taxonomy,” Huang explained. “More entities can be extracted from the taxonomy to obey the same relation and therefore we can expand the taxonomy structure to be a more complete one.”
In the aspect-based sentiment analysis area, Huang has developed an embedding-based framework to learn the topics of each online review sentence automatically. If a person provides an aspect category like service, food, location, or ambience of a restaurant, her model learns from a lot of review sentences without labels to automatically extract related keywords in each aspect.
“This method is based on the assumption that similar words tend to appear in similar local contexts or similar documents,” she said. “My model not only outputs the topic words for each aspect, but also classifies reviews in correct aspects.”
In the summer of 2020, Huang worked as an intern at Microsoft Research on Named Entity Recognition (NER), which is a fundamental task in natural language processing that takes pieces of unstructured text as input, locates certain words or phrases in sentences, and classifies them into specific entity types, such as persons, organizations, locations, dates and quantities. Working with Microsoft researchers Chunyuan Li and Krishnan Subudhi, Huang mainly focused on the few-shot learning setting, where there is very little training data and only a few annotated sentences are given for each entity type.
Winning this Fellowship has inspired Huang to continue along her research path and produce additional impactful results. The Fellowship provides a tuition and fee waiver along with a $42,000 stipend and an invitation to attend a two-day PhD Summit workshop hosted by Microsoft Research, where Huang can meet with the other Fellows and Microsoft researchers.
“I really want to thank Professor Jiawei Han and the CS department for encouraging me to apply for this Fellowship,” she said. “Professor Han also has given me many wise ideas and suggestions for both my PhD study and the Fellowship application.”
Lam was among six doctoral students nationwide to receive a $25,000 Google Center for Minorities and People with Disabilities IT (CMD-IT) LEAP Dissertation Fellowship.* His research aims to improve software dependability by identifying and fixing software problems during the testing phase. He is co-advised by CS Professors Tao Xie and Darko Marinov.
According to Lam, developers typically perform software testing to ensure that their code changes do not break existing functionality. During software testing, developers often waste time debugging their code changes because of spurious failures from flaky tests, which are tests that can non-deterministically pass or fail on the same code.
“In recent years, many companies, such as Apple, Google, Facebook, and Microsoft have published blogs and research papers about these spurious failures, misleading developers about their code changes, wasting developers’ time, and reducing developers’ trust in testing,” Lam explained.
Lam’s work on characterizing flaky tests has helped establish these tests as a new area of research. He created an effective tool to detect flaky tests known as iDFlakies, which employs several detection algorithms to detect and automatically categorize order dependent (OD) and non-order dependent (NOD) tests.
“Using iDFlakies, I started an increasingly-used dataset of 2,000-plus flaky tests in popular, open-source projects,” Lam said.
Lam has also proposed iFixFlakies, the first tool for automatically fixing flaky tests. He has utilized this tool to fix hundreds of flaky tests in popular, open-source projects.
In another area of research, Lam proposed an effective testing tool that was adopted by Tencent, the company that makes the popular WeChat messaging app used by more than one billion users each month. Working with researchers at Fujitsu, Lam also co-published a popular dataset of real-world bugs for program repair evaluation.
Lam, who expects to graduate in the summer of 2021, looks forward to becoming a tenure-track faculty member so he can teach and continue conducting research.
“I am extremely grateful to Google and the CMD-IT organization for this fellowship, which has helped reaffirm my belief that my work is of high importance and interest to practitioners and researchers,” said Lam, who was diagnosed with an invisible disability. “I believe that much work needs to be done to address the lack of students from underrepresented groups in computer science, and I will use the support from this Fellowship to help increase the computer-science participation of students from underrepresented groups, especially those with disabilities.”
*The Google CMD-IT LEAP Dissertation Fellowship—Diversifying LEAdership in the Professoriate—was formerly known as the Diversifying Future Leadership in the Professoriate (FLIP) Fellowship.