Yash Nair

Yash Nair

Home Institution
Harvard University

Year Participated

Year in School

REU Faculty Mentor
Nan Jiang

Research Area Interest
Artificial Intelligence

Project Title

Biography & Research Abstract


The two topics we focused on this summer were Long-Horizon Concurrent Probably Approximately Correct Reinforcement Learning and Off Policy Evaluation (OPE) for Partially Observable Markov Decision Processes (POMDPs). In the former, we noticed an immediate extension of the algorithm of Wang et al. (2020) to the concurrent setting, but struggled with analyzing heuristics such as understanding diverse policy sets, and how exploration under diverse policies might improve sample complexity. For the latter, we are exploring multi-step extensions of the one-step proxy ideas in Tennenholtz et al. (2019) as well as understanding and analyzing OPE algorithms for POMDPs.


I am an undergraduate at Harvard University, concentrating in mathematics and pursuing an M.S. in Computer Science. I first became interested in reinforcement learning (RL) after working with Professor Finale Doshi-Velez at Harvard, first on RL algorithms for healthcare, and then later on more theoretical aspects of RL: we've recently submitted a paper to arXiv entitled "PAC Bounds for Imitation and Model-based Batch Learning of Contextual Markov Decision Processes". I've been working with Professor Nan Jiang this summer on more interesting problems in RL theory!