🤖 AI Summary
In time-varying multi-agent games, abrupt rule changes undermine the adaptability of conventional online learning methods and invalidate standard regret bounds.
Method: This paper proposes the first prediction-aware multi-agent online learning framework, explicitly integrating natural-state online prediction into the learning process. It establishes, for the first time, a theoretical linkage between prediction error and convergence guarantees for social welfare and Nash equilibrium. We design POWMU (Prediction-driven Optimistic Multiplicative Weights Update), an algorithm that achieves near-optimal social welfare and equilibrium convergence—comparable to static games—under bounded prediction error.
Contribution/Results: Our theoretical analysis breaks the traditional “no-prediction” assumption, providing rigorous convergence and welfare guarantees dependent on prediction quality. Empirical evaluation on dynamic traffic routing tasks demonstrates significant improvements over state-of-the-art baselines, validating enhanced adaptability and collaborative efficiency in non-stationary environments.
📝 Abstract
The framework of uncoupled online learning in multiplayer games has made significant progress in recent years. In particular, the development of time-varying games has considerably expanded its modeling capabilities. However, current regret bounds quickly become vacuous when the game undergoes significant variations over time, even when these variations are easy to predict. Intuitively, the ability of players to forecast future payoffs should lead to tighter guarantees, yet existing approaches fail to incorporate this aspect. This work aims to fill this gap by introducing a novel prediction-aware framework for time-varying games, where agents can forecast future payoffs and adapt their strategies accordingly. In this framework, payoffs depend on an underlying state of nature that agents predict in an online manner. To leverage these predictions, we propose the POWMU algorithm, a contextual extension of the optimistic Multiplicative Weight Update algorithm, for which we establish theoretical guarantees on social welfare and convergence to equilibrium. Our results demonstrate that, under bounded prediction errors, the proposed framework achieves performance comparable to the static setting. Finally, we empirically demonstrate the effectiveness of POWMU in a traffic routing experiment.