Recurrent Structural Policy Gradient for Partially Observable Mean Field Games

📅 2026-02-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of efficient, low-variance algorithms supporting history-dependent policies in partially observable mean-field games. The authors propose a recurrent policy gradient method that integrates Monte Carlo sampling of common noise with exact expected return estimation. For the first time, this approach incorporates history-aware capabilities into a hybrid architecture by combining known environment dynamics, recurrent neural networks, and structured policy gradients, effectively handling complex scenarios involving heterogeneous agents, common noise, and history-dependent strategies. Implemented in the JAX-based MFAX framework, the method achieves state-of-the-art performance on a macroeconomic mean-field game benchmark, accelerating convergence by an order of magnitude and providing the first successful solution to this class of challenging problems.

Technology Category

Application Category

📝 Abstract
Mean Field Games (MFGs) provide a principled framework for modeling interactions in large population models: at scale, population dynamics become deterministic, with uncertainty entering only through aggregate shocks, or common noise. However, algorithmic progress has been limited since model-free methods are too high variance and exact methods scale poorly. Recent Hybrid Structural Methods (HSMs) use Monte Carlo rollouts for the common noise in combination with exact estimation of the expected return, conditioned on those samples. However, HSMs have not been scaled to Partially Observable settings. We propose Recurrent Structural Policy Gradient (RSPG), the first history-aware HSM for settings involving public information. We also introduce MFAX, our JAX-based framework for MFGs. By leveraging known transition dynamics, RSPG achieves state-of-the-art performance as well as an order-of-magnitude faster convergence and solves, for the first time, a macroeconomics MFG with heterogeneous agents, common noise and history-aware policies. MFAX is publicly available at: https://github.com/CWibault/mfax.
Problem

Research questions and friction points this paper is trying to address.

Partially Observable Mean Field Games
Hybrid Structural Methods
Common Noise
History-aware Policies
Scalability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Recurrent Structural Policy Gradient
Partially Observable Mean Field Games
Hybrid Structural Methods
Common Noise
History-aware Policies
C
Clarisse Wibault
FLAIR, Foerster Lab for AI Research, University of Oxford
Johannes Forkel
Johannes Forkel
PostDoc in Machine Learning, PhD in Mathematics, University of Oxford
Multi-Agent Reinforcement LearningRandom Matrix TheoryMathematical Physics
S
Sebastian Towers
FLAIR, Foerster Lab for AI Research, University of Oxford
T
Tiphaine Wibault
iFo Institute, Ludwig-Maximilians-Universität Munich
J
Juan Duque
MILA, Québec AI Institute
George Whittle
George Whittle
University of Oxford
Machine LearningApproximate InferenceDeep LearningQuantum Device Control
A
Andreas Schaab
UC Berkeley
Yucheng Yang
Yucheng Yang
University of Zurich and Swiss Finance Institute
MacroeconomicsFinanceMachine LearningComputational EconomicsMonetary Economics
C
Chiyuan Wang
Peking University
M
Michael Osborne
MLRG, Machine Learning Research Group, University of Oxford
Benjamin Moll
Benjamin Moll
Sir John Hicks Professor of Economics, London School of Economics
MacroeconomicsInequality
Jakob Foerster
Jakob Foerster
Associate Professor, University of Oxford
Artificial Intelligence