🤖 AI Summary
This paper addresses the problem of learning approximate correlated equilibria in convex games with unknown cost functions, where players minimize convex costs—dependent on others’ strategies—subject to convex constraints. We propose an active learning framework orchestrated by a central coordinator that queries players’ regret values to drive equilibrium computation. Our core methodological contribution lies in integrating diversity-maximizing action selection with Bayesian optimization to online estimate the joint strategy distribution over infinite action spaces, thereby approximating a correlated equilibrium. The approach unifies regret minimization, heuristic representative action sampling, and probabilistic modeling—without requiring prior knowledge of cost function structure. Experiments on multi-user traffic assignment games demonstrate that the learned strategy distribution achieves significantly lower aggregate regret, validating both effectiveness and scalability.
📝 Abstract
Correlated equilibrium generalizes Nash equilibrium by allowing a central coordinator to guide players' actions through shared recommendations, similar to how routing apps guide drivers. We investigate how a coordinator can learn a correlated equilibrium in convex games where each player minimizes a convex cost function that depends on other players' actions, subject to convex constraints without knowledge of the players' cost functions. We propose a learning framework that learns an approximate correlated equilibrium by actively querying players' regrets, emph{i.e.}, the cost saved by deviating from the coordinator's recommendations. We first show that a correlated equilibrium in convex games corresponds to a joint action distribution over an infinite joint action space that minimizes all players' regrets. To make the learning problem tractable, we introduce a heuristic that selects finitely many representative joint actions by maximizing their pairwise differences. We then apply Bayesian optimization to learn a probability distribution over the selected joint actions by querying all players' regrets. The learned distribution approximates a correlated equilibrium by minimizing players' regrets. We demonstrate the proposed approach via numerical experiments on multi-user traffic assignment games in a shared transportation network.