Nan Jiang Furthers Development in Reinforcement Learning with NSF CAREER Award

8/8/2022 Aaron Seidlitz, Illinois CS

Breaking from tendencies in breakthroughs for RL that do not apply to many real-world problems, Jiang will seek to develop a different paradigm by utilizing offline RL.

Written by Aaron Seidlitz, Illinois CS

Illinois Computer Science professor Nan Jiang hasn’t wavered from his research focus in reinforcement learning (RL). That includes his time as a PhD student at the University of Michigan, his time as postdoctoral researcher with Microsoft Research Lab in New York City, and as a professor here since 2018.

More news about NSF CAREER Awards:

His research focus is a part of the Artificial Intelligence (AI) field, specifically through machine learning (ML). The basic idea of RL concerns the ways in which intelligent agents should take action and is one of three ML prongs, including supervised learning and unsupervised learning.

For years, breakthroughs in RL research stem from simulator-defined problems conducted in a trial-and-error fashion inside a virtual environment.

Breaking from this tendency, Jiang proposed an NSF CAREER Award winning topic entitled “Theoretical Foundations of Offline Reinforcement Learning” that resulted in five years and $500,000 in funding support.

“It is difficult to apply these online algorithms to real-world problems, as trial-and-error is often expensive or impossible in real life,” Jiang stated in his award abstract. “A promising paradigm to address this issue is offline RL, where the agent learns solely from historical data. While the lack of direct interactions with the real environment prevents undesirable real-world consequences, it also gives rise to significant technical challenges in learning.

“This project aims to develop novel methods to address these challenges and provide a deep theoretical understanding for offline RL and make significant progress in enabling offline RL in real-life applications.”

Jiang also noted that these real-life applications include robotics, adaptive medical treatment, online recommendation systems, and more.

However, RL achievements in these areas haven’t been as common. Instead, Jiang noted, the success stories from RL simulators includes “achieving human-level performance in video game playing and board games such as Go and Chess.”

While impressed with these technical achievements, Jiang doesn’t believe that they truly represent the potential impact RL offers our society.

“There are a broad family of problems that require data-driven sequential decision-making – which is what RL is supposed to address – but do not come with nearly perfect simulators,” Jiang said. “That’s often because human users, patients, or customers are part of the ‘environment’ in the definition, which causes the simulation to be difficult if not impossible.

“For the same reason, these applications often have high potential societal impacts because they interact with and serve or assist human beings.”

And that’s why Jiang’s inspiration stems from both the high potential of these applications – think personalized medical treatment – and the fact, as Jiang said, “that existing research and algorithms that rely on access to simulators cannot adequately address the challenges in such real-world problems.”

While thankful for this new opportunity through the NSF CAREER Award, Jiang’s track record in this area is only growing.

Additionally, a paper covering the same topic recently earned selection as an Outstanding Paper Runner Up at the 2022 International Conference on Machine Learning.

While Jiang primarily works in theory, he did note that many of the group’s recent works have attracted industry attention – furthering his belief that they are together pushing toward significant results.

“The ICML 2022 paper considers the problem that offline RL algorithms are often very sensitive and can produce degenerate behavior when their hyperparameters are not properly configured,” Jiang said. “It proposes novel algorithms that come with a type of robustness guarantee. We have received industry inquiries that express interest as this property is exactly what they want in practice.”

Between this work and the forthcoming results from the NSF CAREER Award, Jiang believes a pipeline can be built to enable offline RL participation in future applications.

“To me, what offline RL really needs is a mature pipeline that practitioners can use to enable RL in sequential decision making tasks in real-world scenarios,” Jiang said.

Share this story

This story was published August 8, 2022.