Lily Yang

Lily Yang

Home Institution
University of Waterloo

Year Participated

Year in School

REU Faculty Mentor
Reyhaneh Jabbarvand

Research Area Interest
Artificial Intelligence

Project Title
Bug Dataset Generation

Biography & Research Abstract


To use machine learning and specifically deep learning for software analysis tasks that require detecting, localizing, and repairing software bugs, the first step is to have a (1) large and (2) high-quality training dataset of the buggy and non-buggy versions of the code. There are some bug datasets such as Defects4J and BugSwarm that involve real-world bugs collected from open-source projects. However, these datasets are relatively small, i.e., each contains around 800 unique bugs. State-of-the-art relies on mutation testing and specifically, higher-order mutants to inject artificial bugs into the code. However, there is always a debate about whether artificial bugs are representative of real bugs or not. In this project, we aim to use generative models to learn how to generate bugs that mimic real-world bugs. Such techniques can help with the generation of many bugs to use for training machine learning for code analysis tasks.


My name is Lily Yang, I am a second-year undergraduate student at the University of Waterloo, majoring in Mathematics with a strong interest in various areas including Software Engineering, Machine Learning and Artificial Intelligence. For the upcoming 2022 summer, except being part of UIUC CS REU Program working as an undergraduate researcher under Professor Reyhaneh Jabbarvand's supervision, I will start my first internship working as a Cloud Engineer at NN Re at the same time.