🤖 AI Summary
This paper studies no-regret learning in multi-player general-sum games, aiming to minimize cumulative regret for each player. We propose a decoupled online algorithm that integrates optimistic multiplicative weights update (OMWU) with an adaptive non-monotonic learning rate scheme, incorporating a cautious optimism strategy to dynamically adjust the learning pace. Theoretically, the algorithm achieves a per-player regret upper bound of $O(n log^2 d log T)$, where $n$ is the number of players, $d$ the number of actions per player, and $T$ the number of rounds. This improves the dependence on action dimension from exponential to $log^2 d$ and reduces the time dependence from $log^4 T$ to $log T$, compared to prior methods. To our knowledge, this is the tightest per-player regret bound established for multi-player general-sum games, significantly outperforming baselines such as Log-Regularized Lifted Optimistic FTRL and Optimistic Hedge.
📝 Abstract
We establish the first uncoupled learning algorithm that attains $O(n log^2 d log T)$ per-player regret in multi-player general-sum games, where $n$ is the number of players, $d$ is the number of actions available to each player, and $T$ is the number of repetitions of the game. Our results exponentially improve the dependence on $d$ compared to the $O(n, d log T)$ regret attainable by Log-Regularized Lifted Optimistic FTRL [Far+22c], and also reduce the dependence on the number of iterations $T$ from $log^4 T$ to $log T$ compared to Optimistic Hedge, the previously well-studied algorithm with $O(n log d log^4 T)$ regret [DFG21]. Our algorithm is obtained by combining the classic Optimistic Multiplicative Weights Update (OMWU) with an adaptive, non-monotonic learning rate that paces the learning process of the players, making them more cautious when their regret becomes too negative.