Beyond Pessimism: Offline Learning in KL-regularized Games

📅 2026-04-08

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

This work addresses the suboptimal statistical rates in offline learning for KL-regularized two-player zero-sum games, which arise due to distributional shift. The authors propose a novel algorithm and analysis framework that eschew pessimistic estimation, leveraging the smoothness of KL-regularized best responses and the skew-symmetric stability of Nash equilibria. Under the offline setting, their method achieves a sample complexity of Õ(1/n), breaking the conventional Õ(1/√n) barrier. The proposed algorithm requires only a linear number of iterations to attain statistical efficiency comparable to minimax estimators, substantially improving sample utilization. This is the first result to achieve fast convergence in offline learning for KL-regularized zero-sum games without relying on pessimism-based mechanisms.

Technology Category

Application Category

📝 Abstract

We study offline learning in KL-regularized two-player zero-sum games, where policies are optimized under a KL constraint to a fixed reference policy. Prior work relies on pessimistic value estimation to handle distribution shift, yielding only $\widetilde{\mathcal{O}}(1/\sqrt n)$ statistical rates. We develop a new pessimism-free algorithm and analytical framework for KL-regularized games, built on the smoothness of KL-regularized best responses and a stability property of the Nash equilibrium induced by skew symmetry. This yields the first $\widetilde{\mathcal{O}}(1/n)$ sample complexity bound for offline learning in KL-regularized zero-sum games, achieved entirely without pessimism. We further propose an efficient self-play policy optimization algorithm and prove that, with a number of iterations linear in the sample size, it achieves the same fast $\widetilde{\mathcal{O}}(1/n)$ statistical rate as the minimax estimator.

Problem

Research questions and friction points this paper is trying to address.

offline learning

KL-regularized games

zero-sum games

sample complexity

distribution shift

Innovation

Methods, ideas, or system contributions that make the work stand out.

offline learning

KL regularization

zero-sum games