【系综合学术报告】2025年第18期 || Continuous-time q-learning for mean-field control problems with common noise-清华大学数学科学系

首页 -- 学术 · 动态 -- 论坛讲座 -- 正文

论坛讲座

【系综合学术报告】2025年第18期 || Continuous-time q-learning for mean-field control problems with common noise

发布日期：2025/06/13

报告题目：Continuous-time q-learning for mean-field control problems with common noise

报告人：Xiang Yu (the Hong Kong Polytechnic University)

时间：6月17日（周二）上午9:30-11:30

地点：理科楼A404

摘要 This talk investigates the continuous-time entropy-regularized reinforcement learning (RL) for mean-field control problems with controlled common noise. We study the continuous-time counterpart of the Q-function in the mean-field model, coined as q-function in the single agent's model. It is shown that the controlled common noise gives rise to a double integral term in the exploratory dynamic programming equation, rendering the policy improvement iteration intricate. The policy improvement at each iteration can be characterized by a first-order condition using the notion of partial linear derivative in policy. To devise some model-free RL algorithms, we introduce the integrated q-function (Iq-function) on distributions of both state and action, and an optimal policy can be identified as a two-layer fixed point to the soft argmax operator of the Iq-function. The martingale characterization of the value function and Iq-function is established by exhausting all test policies. This allows us to propose several algorithms including the Actor-Critic q-learning algorithm, in which the policy is updated in the Actor-step based on the policy improvement rule induced by the partial linear derivative of the Iq-function and the value function and Iq-function are updated simultaneously in the Critic-step based on the martingale orthogonality condition. In two examples, within and beyond LQ-control framework, we implement and compare our algorithms with satisfactory performance.

Bio: Prof. Xiang Yu is currently an Associate Professor in the Department of Applied Mathematics at the Hong Kong Polytechnic University. Prior to this, he served as an Assistant Professor in the same department and worked as a postdoctoral Assistant Professor in the Department of Mathematics at the University of Michigan. Prof. Yu received his Ph.D. degree in Mathematics from the University of Texas at Austin in 2012, and the B.S. degree in Mathematics from Huazhong University of Science and Technology in 2007. His research interests lie at the intersection of quantitative finance, stochastic analysis, stochastic control, and reinforcement learning. He has published extensively in top journals such as Mathematical Finance, Finance and Stochastics, Annals of Applied Probability, Mathematics of Operations Research, SIAM Journal on Control and Optimization, Stochastic Processes and their Applications, Automatica.

邀请人：梁宗霞