🤖 AI Summary
This paper investigates the convergence of regret matching (RM) and its variants in potential games and constrained optimization. Focusing on the RM+ algorithm, we establish its first theoretical guarantee: it converges to an $varepsilon$-KKT point in $O(1/varepsilon^4)$ iterations; when cumulative regret is bounded, convergence accelerates to $O(1/varepsilon^2)$, and we prove an exponential separation lower bound between RM and RM+. Leveraging tools from online learning and optimization theory—particularly cumulative regret analysis and precise KKT condition characterization—we establish a local one-step improvement property. Our results show that RM+ is the first first-order constrained optimizer based on regret minimization with provable efficiency, outperforming standard gradient methods in convergence rate. Moreover, we reveal that RM+ converges to Nash equilibria in potential games exponentially slower than to coarse correlated equilibria—resolving a fundamental open question regarding the theoretical behavior of regret-based algorithms in non-zero-sum games.
📝 Abstract
Regret matching (RM} -- and its modern variants -- is a foundational online algorithm that has been at the heart of many AI breakthrough results in solving benchmark zero-sum games, such as poker. Yet, surprisingly little is known so far in theory about its convergence beyond two-player zero-sum games. For example, whether regret matching converges to Nash equilibria in potential games has been an open problem for two decades. Even beyond games, one could try to use RM variants for general constrained optimization problems. Recent empirical evidence suggests that they -- particularly regret matching$^+$ (RM$^+$) -- attain strong performance on benchmark constrained optimization problems, outperforming traditional gradient descent-type algorithms.
We show that alternating RM$^+$ converges to an $ε$-KKT point after $O_ε(1/ε^4)$ iterations, establishing for the first time that it is a sound and fast first-order optimizer. Our argument relates the KKT gap to the accumulated regret, two quantities that are entirely disparate in general but interact in an intriguing way in our setting, so much so that when regrets are bounded, our complexity bound improves all the way to $O_ε(1/ε^2)$. From a technical standpoint, while RM$^+$ does not have the usual one-step improvement property in general, we show that it does in a certain region that the algorithm will quickly reach and remain in thereafter. In sharp contrast, our second main result establishes a lower bound: RM, with or without alternation, can take an exponential number of iterations to reach a crude approximate solution even in two-player potential games. This represents the first worst-case separation between RM and RM$^+$. Our lower bound shows that convergence to coarse correlated equilibria in potential games is exponentially faster than convergence to Nash equilibria.