Year in School
REU Faculty Mentor
Research Area Interest
Biography & Research Abstract
The two topics we focused on this summer were Long-Horizon Concurrent Probably Approximately Correct Reinforcement Learning and Off Policy Evaluation (OPE) for Partially Observable Markov Decision Processes (POMDPs). In the former, we noticed an immediate extension of the algorithm of Wang et al. (2020) to the concurrent setting, but struggled with analyzing heuristics such as understanding diverse policy sets, and how exploration under diverse policies might improve sample complexity. For the latter, we are exploring multi-step extensions of the one-step proxy ideas in Tennenholtz et al. (2019) as well as understanding and analyzing OPE algorithms for POMDPs.
I am an undergraduate at Harvard University, concentrating in mathematics and pursuing an M.S. in Computer Science. I first became interested in reinforcement learning (RL) after working with Professor Finale Doshi-Velez at Harvard, first on RL algorithms for healthcare, and then later on more theoretical aspects of RL: we've recently submitted a paper to arXiv entitled "PAC Bounds for Imitation and Model-based Batch Learning of Contextual Markov Decision Processes". I've been working with Professor Nan Jiang this summer on more interesting problems in RL theory!