Deep reinforcement learning has achieved remarkable success, especially in gaming and other applications whose environments are artificial or are associated with low exploration costs. However, for most critical industrial applications, interactions with the environments are very costly—and bad explorations might lead to a disaster. In this situation, a new paradigm of deep reinforcement learning is greatly needed. In this talk, the researchers will introduce a new framework called continual offline reinforcement learning and discuss how to better trade off between policy improvement and global convergence in this framework. They will also discuss how to evaluate an offline learned policy in a more accurate manner before deploying it into real environments. After that, they will introduce several real examples, in which continual offline reinforcement learning was applied to solve difficult problems in the industrial domains of logistics and supply chain. At the end, the researchers will discuss remaining challenges and technical trends in this important space.
Learn more about the 2021 Microsoft Research Summit: https://Aka.ms/researchsummit