🤖 AI Summary
Designing reward functions for heterogeneous multi-agent systems in real-world scenarios is challenging; existing inverse reinforcement learning (IRL) methods under mean-field games (MFGs) assume agent homogeneity and thus fail to generalize to expert demonstrations with unknown tasks and heterogeneous behaviors.
Method: We propose a deep latent-variable MFG framework coupled with a meta-IRL approach that jointly learns latent states and reward functions via probabilistic context modeling—enabling cross-task inference without prior task context, provided structural similarity.
Contribution/Results: Our method achieves interpretable modeling of heterogeneous agent behavior and robust reward recovery while preserving fixed model architecture. Experiments on synthetic benchmarks and a real-world urban taxi dynamic pricing task demonstrate significant improvements over state-of-the-art methods, with higher reward estimation accuracy and superior policy reconstruction fidelity.
📝 Abstract
Designing suitable reward functions for numerous interacting intelligent agents is challenging in real-world applications. Inverse reinforcement learning (IRL) in mean field games (MFGs) offers a practical framework to infer reward functions from expert demonstrations. While promising, the assumption of agent homogeneity limits the capability of existing methods to handle demonstrations with heterogeneous and unknown objectives, which are common in practice. To this end, we propose a deep latent variable MFG model and an associated IRL method. Critically, our method can infer rewards from different yet structurally similar tasks without prior knowledge about underlying contexts or modifying the MFG model itself. Our experiments, conducted on simulated scenarios and a real-world spatial taxi-ride pricing problem, demonstrate the superiority of our approach over state-of-the-art IRL methods in MFGs.