From Average-Iterate to Last-Iterate Convergence in Games: A Reduction and Its Applications

📅 2025-06-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the convergence of online learning algorithms in self-play settings within game theory and machine learning. We propose the first generic black-box reduction framework that transforms average-iterate convergence dynamics into last-iterate convergence, applicable to linear-utility games—including two-player zero-sum and multi-player normal-form games. Our method integrates optimistic multiplicative weights update (OMWU), gradient feedback, and arm-feedback mechanisms, augmented with iterative reweighting—without modifying base algorithms or rebuilding convergence analyses. The core contribution is the first paradigm-shifting framework bridging average-iterate to last-iterate convergence. In two-player zero-sum games, we achieve a gradient feedback rate of $O(log d / T)$—exponentially improving dimension dependence—and an arm-feedback rate of $ ilde{O}(d^{1/5}T^{-1/5})$, both significantly surpassing prior state-of-the-art bounds.

Technology Category

Application Category

📝 Abstract
The convergence of online learning algorithms in games under self-play is a fundamental question in game theory and machine learning. Among various notions of convergence, last-iterate convergence is particularly desirable, as it reflects the actual decisions made by the learners and captures the day-to-day behavior of the learning dynamics. While many algorithms are known to converge in the average-iterate, achieving last-iterate convergence typically requires considerably more effort in both the design and the analysis of the algorithm. Somewhat surprisingly, we show in this paper that for a large family of games, there exists a simple black-box reduction that transforms the average iterates of an uncoupled learning dynamics into the last iterates of a new uncoupled learning dynamics, thus also providing a reduction from last-iterate convergence to average-iterate convergence. Our reduction applies to games where each player's utility is linear in both their own strategy and the joint strategy of all opponents. This family includes two-player bimatrix games and generalizations such as multi-player polymatrix games. By applying our reduction to the Optimistic Multiplicative Weights Update algorithm, we obtain new state-of-the-art last-iterate convergence rates for uncoupled learning dynamics in two-player zero-sum normal-form games: (1) an $O(frac{log d}{T})$ last-iterate convergence rate under gradient feedback, representing an exponential improvement in the dependence on the dimension $d$ (i.e., the maximum number of actions available to either player); and (2) an $widetilde{O}(d^{frac{1}{5}} T^{-frac{1}{5}})$ last-iterate convergence rate under bandit feedback, improving upon the previous best rates of $widetilde{O}(sqrt{d} T^{-frac{1}{8}})$ and $widetilde{O}(sqrt{d} T^{-frac{1}{6}})$.
Problem

Research questions and friction points this paper is trying to address.

Reducing last-iterate convergence to average-iterate convergence in games
Applying reduction to Optimistic Multiplicative Weights Update algorithm
Achieving improved last-iterate convergence rates in zero-sum games
Innovation

Methods, ideas, or system contributions that make the work stand out.

Black-box reduction for last-iterate convergence
Applies to linear utility games
Improves convergence rates exponentially
🔎 Similar Papers
No similar papers found.