🤖 AI Summary
To address the inefficiency and instability of conventional forward-backward fixed-point iteration (FPI) in mean-field game (MFG) learning for large-scale multi-agent systems—caused by oscillatory behavior—this paper proposes a unified optimization framework that jointly treats policies and population distributions as co-optimizable control variables, enabling asynchronous joint updates. We introduce the first gradient-based MFG learning algorithm for continuous state-action spaces: population-aware linear function approximation (PA-LFA). Theoretically, we prove finite-time convergence to exact equilibria for contractive linear MFGs, asymptotic convergence to an equilibrium neighborhood under milder conditions, and derive approximation error bounds for nonlinear MFGs. Extensive evaluation across six benchmark tasks demonstrates the method’s effectiveness and robustness.
📝 Abstract
Mean field games (MFGs) model interactions in large-population multi-agent systems through population distributions. Traditional learning methods for MFGs are based on fixed-point iteration (FPI), where policy updates and induced population distributions are computed separately and sequentially. However, FPI-type methods may suffer from inefficiency and instability due to potential oscillations caused by this forward-backward procedure. In this work, we propose a novel perspective that treats the policy and population as a unified parameter controlling the game dynamics. By applying stochastic parameter approximation to this unified parameter, we develop SemiSGD, a simple stochastic gradient descent (SGD)-type method, where an agent updates its policy and population estimates simultaneously and fully asynchronously. Building on this perspective, we further apply linear function approximation (LFA) to the unified parameter, resulting in the first population-aware LFA (PA-LFA) for learning MFGs on continuous state-action spaces. A comprehensive finite-time convergence analysis is provided for SemiSGD with PA-LFA, including its convergence to the equilibrium for linear MFGs -- a class of MFGs with a linear structure concerning the population -- under the standard contractivity condition, and to a neighborhood of the equilibrium under a more practical condition. We also characterize the approximation error for non-linear MFGs. We validate our theoretical findings with six experiments on three MFGs.