🤖 AI Summary
This work studies how a strategic optimizer can guide a no-regret learner to converge to a Stackelberg equilibrium in repeated two-player finite games, *without knowledge of the learner’s payoff function*. Existing approaches fail to reliably induce convergence when only the learner’s use of a generalized no-regret algorithm is known. To address this, we propose a novel paradigm centered on *inverse inference of the learner’s payoff structure*. Under restricted algorithmic classes—specifically, stochastic mirror ascent with known regularization and step sizes—we achieve the first provable learnability result. By integrating game-theoretic modeling, online learning analysis, and inverse modeling of gradient/mirror ascent dynamics, we construct a guiding policy with provable convergence guarantees, attaining near-Stackelberg-optimal utility for the optimizer. This work establishes the first theoretically sound framework for strategic steering in settings where the learner is a black-box no-regret agent.
📝 Abstract
We consider the problem of learning to exploit learning algorithms through repeated interactions in games. Specifically, we focus on the case of repeated two player, finite-action games, in which an optimizer aims to steer a no-regret learner to a Stackelberg equilibrium without knowledge of its payoffs. We first show that this is impossible if the optimizer only knows that the learner is using an algorithm from the general class of no-regret algorithms. This suggests that the optimizer requires more information about the learner's objectives or algorithm to successfully exploit them. Building on this intuition, we reduce the problem for the optimizer to that of recovering the learner's payoff structure. We demonstrate the effectiveness of this approach if the learner's algorithm is drawn from a smaller class by analyzing two examples: one where the learner uses an ascent algorithm, and another where the learner uses stochastic mirror ascent with known regularizer and step sizes.