🤖 AI Summary
This work addresses infinite-horizon, discounted, discrete-time stationary mean-field games (MFGs) and introduces the first maximum causal entropy inverse reinforcement learning (IRL) framework for MFGs, enabling consistent recovery of population-level reward functions from expert policy data. Methodologically, the MFG is reformulated as a generalized Nash equilibrium problem (GNEP), and a convex optimization paradigm—based on linear programming relaxation—is proposed; this is integrated with gradient descent to yield a provably convergent IRL algorithm and a novel, efficient algorithm for computing mean-field equilibria (MFE). Key contributions are: (1) the first theoretically grounded maximum causal entropy IRL framework tailored to MFGs; (2) a GNEP-solving strategy that jointly ensures theoretical convergence guarantees and computational efficiency; and (3) comprehensive numerical experiments demonstrating accurate reconstruction of expert MFE behavior, thereby validating both effectiveness and generalization capability.
📝 Abstract
In this paper, we introduce the maximum casual entropy Inverse Reinforcement Learning (IRL) problem for discrete-time mean-field games (MFGs) under an infinite-horizon discounted-reward optimality criterion. The state space of a typical agent is finite. Our approach begins with a comprehensive review of the maximum entropy IRL problem concerning deterministic and stochastic Markov decision processes (MDPs) in both finite and infinite-horizon scenarios. Subsequently, we formulate the maximum casual entropy IRL problem for MFGs - a non-convex optimization problem with respect to policies. Leveraging the linear programming formulation of MDPs, we restructure this IRL problem into a convex optimization problem and establish a gradient descent algorithm to compute the optimal solution with a rate of convergence. Finally, we present a new algorithm by formulating the MFG problem as a generalized Nash equilibrium problem (GNEP), which is capable of computing the mean-field equilibrium (MFE) for the forward RL problem. This method is employed to produce data for a numerical example. We note that this novel algorithm is also applicable to general MFE computations.