🤖 AI Summary
This paper investigates the fundamental distinction between non-ergodic convergence behaviors in game-theoretic learning, focusing on Optimistic Multiplicative Weights Update (OMWU) in $2 imes 2$ matrix games. It analyzes three iteration types—best iterate, random iterate, and last iterate—and establishes that best-iterate and random-iterate convergence are provably inequivalent. Methodologically, the authors develop a two-phase analytical framework to derive an $O(T^{-1/6})$ non-ergodic convergence rate for the best iterate, while constructing explicit counterexamples showing that no polynomial lower bound exists for random-iterate convergence. Leveraging dynamic regret analysis, duality gap metrics, and both asymptotic and non-asymptotic theory, they challenge the conventional assumption of equivalence between best and random iterates. The work provides new theoretical foundations and practical criteria for non-ergodic learning dynamics in adversarial and game-theoretic settings.
📝 Abstract
Non-ergodic convergence of learning dynamics in games is widely studied recently because of its importance in both theory and practice. Recent work (Cai et al., 2024) showed that a broad class of learning dynamics, including Optimistic Multiplicative Weights Update (OMWU), can exhibit arbitrarily slow last-iterate convergence even in simple $2 imes 2$ matrix games, despite many of these dynamics being known to converge asymptotically in the last iterate. It remains unclear, however, whether these algorithms achieve fast non-ergodic convergence under weaker criteria, such as best-iterate convergence. We show that for $2 imes 2$ matrix games, OMWU achieves an $O(T^{-1/6})$ best-iterate convergence rate, in stark contrast to its slow last-iterate convergence in the same class of games. Furthermore, we establish a lower bound showing that OMWU does not achieve any polynomial random-iterate convergence rate, measured by the expected duality gaps across all iterates. This result challenges the conventional wisdom that random-iterate convergence is essentially equivalent to best-iterate convergence, with the former often used as a proxy for establishing the latter. Our analysis uncovers a new connection to dynamic regret and presents a novel two-phase approach to best-iterate convergence, which could be of independent interest.