Learning Equilibria from Data: Provably Efficient Multi-Agent Imitation Learning

📅 2025-05-23

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

This paper studies non-interactive multi-agent imitation learning for recovering an ε-Nash equilibrium of a Markov game from expert demonstrations. Addressing the fundamental limitation imposed by single-policy deviation—the “concentration bottleneck”—we introduce the *single-policy deviation concentrability coefficient*, the first formal measure characterizing this constraint in non-interactive settings. Building upon it, we propose MAIL-BRO, the first framework with provable sample complexity guarantees: it converges to an ε-Nash equilibrium using O(ε⁻⁴) queries to both expert and best-response oracles. We further design MURMAIL, an oracle-free variant requiring only O(ε⁻⁸) expert queries. Our theoretical analysis is rigorous and yields tight bounds; empirical evaluation validates efficacy. Key contributions include: (i) a novel concentrability metric tailored to non-interactive multi-agent imitation learning; (ii) the first theoretically grounded framework for this setting; and (iii) two efficient algorithms with precisely characterized query-complexity optimality.

Technology Category

Application Category

📝 Abstract

This paper provides the first expert sample complexity characterization for learning a Nash equilibrium from expert data in Markov Games. We show that a new quantity named the single policy deviation concentrability coefficient is unavoidable in the non-interactive imitation learning setting, and we provide an upper bound for behavioral cloning (BC) featuring such coefficient. BC exhibits substantial regret in games with high concentrability coefficient, leading us to utilize expert queries to develop and introduce two novel solution algorithms: MAIL-BRO and MURMAIL. The former employs a best response oracle and learns an $varepsilon$-Nash equilibrium with $mathcal{O}(varepsilon^{-4})$ expert and oracle queries. The latter bypasses completely the best response oracle at the cost of a worse expert query complexity of order $mathcal{O}(varepsilon^{-8})$. Finally, we provide numerical evidence, confirming our theoretical findings.

Problem

Research questions and friction points this paper is trying to address.

Characterize expert sample complexity for Nash equilibrium learning

Introduce single policy deviation concentrability coefficient necessity

Develop MAIL-BRO and MURMAIL algorithms for efficient equilibrium learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces single policy deviation concentrability coefficient

Develops MAIL-BRO with O(ε⁻⁴) query complexity

Proposes MURMAIL avoiding best response oracle

🔎 Similar Papers

Learning Strategy Representation for Imitation Learning in Multi-Agent Games