【系综合学术报告 & 荷思系友报告】第5期 || q-Learning in Continuous Time-清华大学数学科学系

首页 -- 学术 · 动态 -- 论坛讲座 -- 正文

论坛讲座

【系综合学术报告 & 荷思系友报告】第5期 || q-Learning in Continuous Time

发布日期：2025/05/12

报告题目：q-Learning in Continuous Time

报告人: Yanwei Jia ( the Chinese University of Hong Kong)

时间：2025年5月16日（周五）晚上6:30-8：00

地点：理科楼A304

报告摘要：We study the continuous-time counterpart of Q-learning for reinforcement learning (RL) under the entropy-regularized, exploratory diffusion process formulation introduced by Wang et. al (2020). As the conventional (big) Q-function collapses in continuous time, we consider its first-order approximation and coin the term ''(little) q-function". This function is related to the instantaneous advantage rate function as well as the Hamiltonian. We develop a ''q-learning" theory around the q-function that is independent of time discretization. Given a stochastic policy, we jointly characterize the associated q-function and value function by martingale conditions of certain stochastic processes, in both on-policy and off-policy settings. We then apply the theory to devise different actor-critic algorithms for solving underlying RL problems, depending on whether the density function of the Gibbs measure generated from the q-function can be computed explicitly. The q-learning generalizes to discretely sampled stochastic policies as well as risk-sensitive RL problems, demonstrating its key role in control and learning theories.

报告人简介：Dr. Yanwei Jia obtained his Ph.D. degree from the National University of Singapore in 2020, and B.Sc. from Tsinghua University in 2016. Prior joining the Department of Systems Engineering and Engineering Management at the Chinese University of Hong Kong in 2023, he was an associate research scientist and adjunct assistant professor in the Department of Industrial Engineering and Operations Research at Columbia University. His research interest falls broadly into financial engineering and decision-making problems, focusing on FinTech and data analytics. His recent research aims to develop fundamental theory on continuous-time reinforcement learning, and to solve problems in financial engineering, such as asset allocation and algorithmic trading. He also uses the structural estimation approach to study the information aggregation mechanism and the wisdom of the crowd.

邀请人：梁宗霞