Beyond Pessimism: Offline Learning in KL-regularized Games

📅 2026-04-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the suboptimal statistical rates in offline learning for KL-regularized two-player zero-sum games, which arise due to distributional shift. The authors propose a novel algorithm and analysis framework that eschew pessimistic estimation, leveraging the smoothness of KL-regularized best responses and the skew-symmetric stability of Nash equilibria. Under the offline setting, their method achieves a sample complexity of Õ(1/n), breaking the conventional Õ(1/√n) barrier. The proposed algorithm requires only a linear number of iterations to attain statistical efficiency comparable to minimax estimators, substantially improving sample utilization. This is the first result to achieve fast convergence in offline learning for KL-regularized zero-sum games without relying on pessimism-based mechanisms.
📝 Abstract
We study offline learning in KL-regularized two-player zero-sum games, where policies are optimized under a KL constraint to a fixed reference policy. Prior work relies on pessimistic value estimation to handle distribution shift, yielding only $\widetilde{\mathcal{O}}(1/\sqrt n)$ statistical rates. We develop a new pessimism-free algorithm and analytical framework for KL-regularized games, built on the smoothness of KL-regularized best responses and a stability property of the Nash equilibrium induced by skew symmetry. This yields the first $\widetilde{\mathcal{O}}(1/n)$ sample complexity bound for offline learning in KL-regularized zero-sum games, achieved entirely without pessimism. We further propose an efficient self-play policy optimization algorithm and prove that, with a number of iterations linear in the sample size, it achieves the same fast $\widetilde{\mathcal{O}}(1/n)$ statistical rate as the minimax estimator.
Problem

Research questions and friction points this paper is trying to address.

offline learning
KL-regularized games
zero-sum games
sample complexity
distribution shift
Innovation

Methods, ideas, or system contributions that make the work stand out.

offline learning
KL regularization
zero-sum games
pessimism-free
sample complexity
🔎 Similar Papers
No similar papers found.