🤖 AI Summary
This work addresses the computational and statistical challenges of offline learning of Nash equilibria (NE) in α-potential games by proposing a data coverage framework based on KL regularization and reference policy anchoring, along with a decentralized Offline Potential Mirror Descent (OPMD) algorithm. The method achieves, for the first time in α-potential games, a fast statistical convergence rate of Õ(1/n), breaking through the prevailing Õ(1/√n) bottleneck in offline multi-agent learning. Both theoretical analysis and empirical experiments demonstrate that OPMD significantly outperforms existing approaches in terms of convergence speed and sample efficiency, thereby establishing a new benchmark for offline equilibrium learning in α-potential games.
📝 Abstract
An $α$-potential game is a multi-player non-cooperative interaction in which a global potential function approximates individual player rewards up to a structural bias $α$. While identifying a Nash Equilibrium (NE) in generic general-sum games is known to be computationally intractable, the potential game structure enables tractable NE identification. In this paper, we study the offline learning of NE in $α$-potential games using KL regularization. To analyze this process, we propose a novel Reference-Anchored offline data coverage framework--a verifiable condition that anchors data requirements to a known reference policy rather than an unknown optimum. Building on this, we propose Offline Potential Mirror Descent (OPMD), a decentralized algorithm that achieves an accelerated $\widetilde{\mathcal{O}}(1/n)$ statistical rate, surpassing the standard $\widetilde{\mathcal{O}}(1/\sqrt{n})$ rate typical of offline multi-agent learning. This work characterizes the first fast-rate offline learning approach for $α$-potential games.