Muhammet Emin Cihangeri

Muhammet Emin Cihangeri

Home Institution
Middle East Technical University

Year Participated
2022

Year in School
Undergraduate

REU Faculty Mentor
Reyhaneh Jabbarvand

Research Area Interest
Artificial Intelligence

Project Title
ML Pipeline Testing

Biography & Research Abstract

Abstract:

Developers of ML libraries often write end-to-end/integration tests that check the validity of the ML pipeline – this includes loading the data, initializing the model/algorithm, training the model, and asserting that some metrics are above a pre-selected threshold. To make testing cost-efficient, developers often use a smaller/toy version of the data (selected from the training dataset). However, it is currently unknown if the choice of data is optimal, i.e., whether it is sufficient to test that the model is learning something useful and catches potential “accuracy” bugs. Hence, the goal of the project is to study and understand the role of training “data” used for testing various ML algorithms/models and their implementations. We aim to study and validate the following Hypotheses/Questions:

  1. Given a (training) dataset D for a model, can we derive a smaller dataset D’ that has (1) a similar accuracy (say within ~5%) as original data, (2) has high fault-detection ability, and (3) can be trained with within a reasonable time (for CI)?
  2. Can we validate whether the model is learning something “useful” (e.g. via some relevant metrics) with the small dataset (derived by us/developer) used in the test?
  3. Can we develop data attacks that perturb the existing dataset and evaluate how that impacts the test in terms of, for instance, (1) its passing probability, (2) fault-detection ability, and (3) code coverage?
  4. Can we develop some interpretability metrics with respect to how the models are tested? 

Bio:

I am a 21 years old student majoring in Computer Engineering at Middle East Technical University. I came here from a science high school in a little town of northern part of Turkey. As a child with eagerness towards science and math, I always dreamed a career/profession in a way related with science. Through my academic life, I was directed to a path embraced by Computer Science. Then my interest for machine learning and related fields developed and here I am. I envision myself dealing with these concepts and always expanding my knowledge, exceeding my boundaries in the future.