🤖 AI Summary
This paper investigates the learnability of mixed-strategy Nash equilibria (NE) in finite games under uncoupled learning—where players observe only their own payoffs, cannot access opponents’ utilities, yet may employ local auxiliary states (higher-order dynamics) for strategy updates. Methodologically, it establishes the first theoretical connection between uncoupled learning and decentralized feedback control. It proves local attainability of isolated fully mixed NE and reveals the non-universality of higher-order dynamics by constructing explicit counterexamples that refute general convergence guarantees. The paper introduces asymptotic best response (ABR) as a novel desideratum, characterizing its necessary and sufficient compatibility with NE and providing an internal stability criterion. Finally, it extends all results to the bandit-feedback setting, ensuring convergence under partial information.
📝 Abstract
We study learnability of mixed-strategy Nash Equilibrium (NE) in general finite games using higher-order replicator dynamics as well as classes of higher-order uncoupled heterogeneous dynamics. In higher-order uncoupled learning dynamics, players have no access to utilities of opponents (uncoupled) but are allowed to use auxiliary states to further process information (higher-order). We establish a link between uncoupled learning and feedback stabilization with decentralized control. Using this association, we show that for any finite game with an isolated completely mixed-strategy NE, there exist higher-order uncoupled learning dynamics that lead (locally) to that NE. We further establish the lack of universality of learning dynamics by linking learning to the control theoretic concept of simultaneous stabilization. We construct two games such that any higher-order dynamics that learn the completely mixed-strategy NE of one of these games can never learn the completely mixed-strategy NE of the other. Next, motivated by imposing natural restrictions on allowable learning dynamics, we introduce the Asymptotic Best Response (ABR) property. Dynamics with the ABR property asymptotically learn a best response in environments that are asymptotically stationary. We show that the ABR property relates to an internal stability condition on higher-order learning dynamics. We provide conditions under which NE are compatible with the ABR property. Finally, we address learnability of mixed-strategy NE in the bandit setting using a bandit version of higher-order replicator dynamics.