Unified continuous-time q-learning for mean-field game and mean-field control problems

📅 2024-07-05
🏛️ arXiv.org
📈 Citations: 2
Influential: 1
📄 PDF

career value

199K/year
🤖 AI Summary
This paper addresses the problem of inconsistent policy evaluation between mean-field games (MFG) and mean-field control (MFC) in continuous-time mean-field jump-diffusion systems with unobservable population distributions. We propose the first unified Q-learning framework for such settings. Methodologically, we introduce a decoupled integrated Q-function (Iq-function) and establish its martingale characterization, thereby unifying MFG equilibrium computation and MFC optimal control within a single Q-learning paradigm. We further design a learning mechanism that updates the population distribution via individual state observations and leverage martingale orthogonality to jointly optimize both MFG and MFC policies. Explicit parametric solutions for the Iq-function and value function are derived in multiple financial applications. Theoretical analysis guarantees algorithmic convergence, while empirical results demonstrate superior and robust policy performance.

Technology Category

Application Category

📝 Abstract
This paper studies the continuous-time q-learning in mean-field jump-diffusion models when the population distribution is not directly observable. We propose the integrated q-function in decoupled form (decoupled Iq-function) from the representative agent's perspective and establish its martingale characterization, which provides a unified policy evaluation rule for both mean-field game (MFG) and mean-field control (MFC) problems. Moreover, we consider the learning procedure where the representative agent updates the population distribution based on his own state values. Depending on the task to solve the MFG or MFC problem, we can employ the decoupled Iq-function differently to characterize the mean-field equilibrium policy or the mean-field optimal policy respectively. Based on these theoretical findings, we devise a unified q-learning algorithm for both MFG and MFC problems by utilizing test policies and the averaged martingale orthogonality condition. For several financial applications in the jump-diffusion setting, we obtain the exact parameterization of the decoupled Iq-functions and the value functions, and illustrate our q-learning algorithm with satisfactory performance.
Problem

Research questions and friction points this paper is trying to address.

Develops continuous-time q-learning for unobservable population distribution models
Unifies policy evaluation for mean-field game and control problems
Proposes a q-learning algorithm for financial jump-diffusion applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decoupled Iq-function for unified policy evaluation
Learning updates population distribution via state values
Unified q-learning algorithm for MFG and MFC