Optimism Without Regularization: Constant Regret in Zero-Sum Games

📅 2025-06-20

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

This work investigates whether optimistic fictitious play (OFP), without regularization, achieves constant regret in two-player zero-sum games. For the long-standing open case of two-strategy games, we establish—for the first time—that OFP attains $O(1)$ constant regret, irrespective of tie-breaking rules, thereby refuting the prior belief that only regularized optimistic follow-the-regularized-leader (FTRL) enjoys this property. Methodologically, we develop a dual-space geometric analysis framework, introduce an iterative energy function, and prove its boundedness to characterize the convergence dynamics of OFP. We rigorously disentangle the effects of “optimism” and “alternating updates” on regret, identifying optimism as the essential mechanism for constant regret. Furthermore, we construct a $Omega(sqrt{T})$ lower bound, demonstrating that alternating fictitious play cannot surpass sub-root-$T$ regret—highlighting the irreplaceability of optimistic prediction. This work provides the first proof of optimal (constant) regret for unregularized optimistic algorithms in zero-sum games.

Technology Category

Application Category

📝 Abstract

This paper studies the optimistic variant of Fictitious Play for learning in two-player zero-sum games. While it is known that Optimistic FTRL -- a regularized algorithm with a bounded stepsize parameter -- obtains constant regret in this setting, we show for the first time that similar, optimal rates are also achievable without regularization: we prove for two-strategy games that Optimistic Fictitious Play (using any tiebreaking rule) obtains only constant regret, providing surprising new evidence on the ability of non-no-regret algorithms for fast learning in games. Our proof technique leverages a geometric view of Optimistic Fictitious Play in the dual space of payoff vectors, where we show a certain energy function of the iterates remains bounded over time. Additionally, we also prove a regret lower bound of $Omega(sqrt{T})$ for Alternating Fictitious Play. In the unregularized regime, this separates the ability of optimism and alternation in achieving $o(sqrt{T})$ regret.

Problem

Research questions and friction points this paper is trying to address.

Achieving constant regret in zero-sum games without regularization

Comparing optimism and alternation in unregularized learning algorithms

Analyzing geometric properties of Optimistic Fictitious Play in dual space

Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimistic Fictitious Play without regularization

Geometric view in dual payoff space

Proves constant regret in two-strategy games

🔎 Similar Papers

No similar papers found.