Kernel Based Maximum Entropy Inverse Reinforcement Learning for Mean-Field Games

📅 2025-07-19

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

This paper addresses inverse reinforcement learning (IRL) in infinite-horizon stationary mean-field games (MFGs), moving beyond conventional linear reward assumptions. Methodologically, it introduces the first kernel-based IRL framework for MFGs, leveraging a reproducing kernel Hilbert space (RKHS) to nonparametrically model arbitrarily complex nonlinear reward functions. By characterizing mean-field equilibria via the soft Bellman equation and employing Fréchet differentiability analysis, the log-likelihood maximization is reformulated as an unconstrained gradient ascent optimization. Theoretically, the algorithm is proven to converge and achieve statistical consistency. Empirical evaluation on a mean-field traffic routing game demonstrates that the proposed method accurately recovers expert policies with significantly higher fidelity than existing approaches relying on linear basis-function reward representations.

Technology Category

Application Category

📝 Abstract

We consider the maximum causal entropy inverse reinforcement learning problem for infinite-horizon stationary mean-field games, in which we model the unknown reward function within a reproducing kernel Hilbert space. This allows the inference of rich and potentially nonlinear reward structures directly from expert demonstrations, in contrast to most existing inverse reinforcement learning approaches for mean-field games that typically restrict the reward function to a linear combination of a fixed finite set of basis functions. We also focus on the infinite-horizon cost structure, whereas prior studies primarily rely on finite-horizon formulations. We introduce a Lagrangian relaxation to this maximum causal entropy inverse reinforcement learning problem that enables us to reformulate it as an unconstrained log-likelihood maximization problem, and obtain a solution lk{via} a gradient ascent algorithm. To illustrate the theoretical consistency of the algorithm, we establish the smoothness of the log-likelihood objective by proving the Fréchet differentiability of the related soft Bellman operators with respect to the parameters in the reproducing kernel Hilbert space. We demonstrate the effectiveness of our method on a mean-field traffic routing game, where it accurately recovers expert behavior.

Problem

Research questions and friction points this paper is trying to address.

Infer nonlinear reward functions in mean-field games

Address infinite-horizon cost structures in reinforcement learning

Develop gradient-based solution for entropy maximization problem

Innovation

Methods, ideas, or system contributions that make the work stand out.

Kernel-based reward modeling in Hilbert space

Lagrangian relaxation for unconstrained optimization

Gradient ascent for log-likelihood maximization

🔎 Similar Papers

Maximum Causal Entropy Inverse Reinforcement Learning for Mean-Field Games