Maximum Causal Entropy Inverse Reinforcement Learning for Mean-Field Games

📅 2024-01-12

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

This work addresses infinite-horizon, discounted, discrete-time stationary mean-field games (MFGs) and introduces the first maximum causal entropy inverse reinforcement learning (IRL) framework for MFGs, enabling consistent recovery of population-level reward functions from expert policy data. Methodologically, the MFG is reformulated as a generalized Nash equilibrium problem (GNEP), and a convex optimization paradigm—based on linear programming relaxation—is proposed; this is integrated with gradient descent to yield a provably convergent IRL algorithm and a novel, efficient algorithm for computing mean-field equilibria (MFE). Key contributions are: (1) the first theoretically grounded maximum causal entropy IRL framework tailored to MFGs; (2) a GNEP-solving strategy that jointly ensures theoretical convergence guarantees and computational efficiency; and (3) comprehensive numerical experiments demonstrating accurate reconstruction of expert MFE behavior, thereby validating both effectiveness and generalization capability.

Technology Category

Application Category

📝 Abstract

In this paper, we introduce the maximum casual entropy Inverse Reinforcement Learning (IRL) problem for discrete-time mean-field games (MFGs) under an infinite-horizon discounted-reward optimality criterion. The state space of a typical agent is finite. Our approach begins with a comprehensive review of the maximum entropy IRL problem concerning deterministic and stochastic Markov decision processes (MDPs) in both finite and infinite-horizon scenarios. Subsequently, we formulate the maximum casual entropy IRL problem for MFGs - a non-convex optimization problem with respect to policies. Leveraging the linear programming formulation of MDPs, we restructure this IRL problem into a convex optimization problem and establish a gradient descent algorithm to compute the optimal solution with a rate of convergence. Finally, we present a new algorithm by formulating the MFG problem as a generalized Nash equilibrium problem (GNEP), which is capable of computing the mean-field equilibrium (MFE) for the forward RL problem. This method is employed to produce data for a numerical example. We note that this novel algorithm is also applicable to general MFE computations.

Problem

Research questions and friction points this paper is trying to address.

Apply Maximum Causal Entropy IRL in Mean-Field Games

Reformulate non-convex MFG problem as convex optimization

Develop GNEP framework for forward RL solutions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Convex reformulation via occupation measures

Gradient descent with guaranteed convergence

MFG as Generalized Nash Equilibrium Problem

🔎 Similar Papers

Reinforcement Learning for Finite Space Mean-Field Type Games