Middle East Technical University
Year in School
REU Faculty Mentor
Research Area Interest
ML Pipeline Testing
Biography & Research Abstract
Developers of ML libraries often write end-to-end/integration tests that check the validity of the ML pipeline – this includes loading the data, initializing the model/algorithm, training the model, and asserting that some metrics are above a pre-selected threshold. To make testing cost-efficient, developers often use a smaller/toy version of the data (selected from the training dataset). However, it is currently unknown if the choice of data is optimal, i.e., whether it is sufficient to test that the model is learning something useful and catches potential “accuracy” bugs. Hence, the goal of the project is to study and understand the role of training “data” used for testing various ML algorithms/models and their implementations. We aim to study and validate the following Hypotheses/Questions:
- Given a (training) dataset D for a model, can we derive a smaller dataset D’ that has (1) a similar accuracy (say within ~5%) as original data, (2) has high fault-detection ability, and (3) can be trained with within a reasonable time (for CI)?
- Can we validate whether the model is learning something “useful” (e.g. via some relevant metrics) with the small dataset (derived by us/developer) used in the test?
- Can we develop data attacks that perturb the existing dataset and evaluate how that impacts the test in terms of, for instance, (1) its passing probability, (2) fault-detection ability, and (3) code coverage?
- Can we develop some interpretability metrics with respect to how the models are tested?
I am a sophomore Computer Engineering student enrolled at Middle East Technical University. During the English preparation semester, I have worked in a seven-student team on the project DEVRÄ°M-Ä°HA, an autonomous rotary-wing UAV with a water intake and release mechanism for firefighters. I have placed 2nd with my team in the "Artificial Intelligence in Transportation Competition" coordinated by TEKNOFEST (Turkey’s first Aerospace and Technology Festival) in which we detected vehicles, pedestrians, Flying Car Parking (FCP) areas, and Flying Ambulance Landing (FAL) areas from UAV footage. I have also participated in "Artificial Intelligence in Healthcare Competition", and worked on the classification of detection, classification, and the segmentation of the brain stroke using the ResNet-18 framework and a state-of-art medical image segmentation model named TransFuse.