Selim Kuzucu

Selim Kuzucu

Home Institution
Middle East Technical University

Year Participated

Year in School

REU Faculty Mentor
Reyhaneh Jabbarvand

Research Area Interest
Artificial Intelligence

Project Title
Dataset Augmentation for Better Interpretability of ML Models

Biography & Research Abstract


The ultimate performance of the neural models for coding and code analysis tasks depends on two factors: (1) the quality of the training dataset and (2) the neural architecture. The focus of state-of-the-art has been on the latter factor, i.e., designing neural architectures and models that can learn the best from some projects in the dataset. These models, while have improved over time to incorporate semantic information such as control flow, data flow, and structured syntax, they have been shown to be vulnerable to adversarial attacks that slightly change the syntax of the code, but preserve the semantics. To advance the state of neural models for code, we aim to focus on the former factor, i.e., augment existing training datasets in such a way that models learn code semantics. The project involves augmenting the dataset of existing neural models for code, retraining them with the new dataset, and assessing the impact of additional data on the performance of these models.


I've been interested in computer science since high school and competed in both national and international robotics competitions. Then, after getting into the college, I worked with KOVAN Robotics Lab and acted as a member of the swarm robotics team of my college to compete in inter-collegiate competitions. Then, I worked as a researcher at Scientific and Technological Research Council of Turkey for 6 months as part of the team to develop a DSL for RF devices. I have also interned at General Electric for 6 months, mostly doing full-stack developer’s work. Most recently, I’ve been working as a researcher in the METU Image Lab for 8 months and AFAR Lab at University of Cambridge for 3 months. At the Image Lab, I’m trying to predict epistemic uncertainty values for images in long-tailed datasets in a one-pass fashion utilizing significantly less time and computational resources than the current methods. At AFAR Lab, I’m working on classifying the co-activations of facial activation units in full facial images using GNNs.