π€ AI Summary
This work addresses the challenge of multi-agent imitation learning in Markov games, where performance often degrades due to state-action distribution shift under overly restrictive concentrability assumptions. Focusing on linear Markov games, the authors introduce a novel concentrability condition defined in feature space, substantially relaxing the requirement on expert coverage. Building upon this, they propose the first computationally efficient, interactive multi-agent imitation learning algorithm that operates without any concentrability coefficient. Theoretical analysis shows that the sample complexity depends only on the feature dimension \(d\), independent of the state-action space size. Empirical evaluations on benchmark games such as Tic-Tac-Toe and Connect4 demonstrate that the proposed deep interactive algorithm significantly outperforms behavioral cloning.
π Abstract
In this work, we present the first theoretical analysis of multi-agent imitation learning (MAIL) in linear Markov games where both the transition dynamics and each agent's reward function are linear in some given features. We demonstrate that by leveraging this structure, it is possible to replace the state-action level "all policy deviation concentrability coefficient" (Freihaut et al., arXiv:2510.09325) with a concentrability coefficient defined at the feature level which can be much smaller than the state-action analog when the features are informative about states' similarity. Furthermore, to circumvent the need for any concentrability coefficient, we turn to the interactive setting. We provide the first, computationally efficient, interactive MAIL algorithm for linear Markov games and show that its sample complexity depends only on the dimension of the feature map $d$. Building on these theoretical findings, we propose a deep MAIL interactive algorithm which clearly outperforms BC on games such as Tic-Tac-Toe and Connect4.