🤖 AI Summary
This paper addresses equilibrium learning in online Stackelberg games, where a leader and follower interact sequentially. To tackle nonlinearity and coupling in joint action spaces, we introduce the *Stackelberg manifold*—a smooth, convex Riemannian manifold constructed via a differentiable diffeomorphism that embeds the joint action space and linearizes the reward function thereon, enabling efficient online learning with standard bandit algorithms. Our method is the first to integrate neural normalizing flows with Riemannian manifold learning, yielding a manifold endowed with equi-planar subspace structure. We establish a finite-time simple regret bound for convex Riemannian manifolds. Experiments on cybersecurity and supply chain optimization tasks demonstrate substantial improvements over state-of-the-art baselines. Theoretically, our approach guarantees sublinear regret convergence, achieving the first online Stackelberg equilibrium learning framework with rigorous finite-time convergence guarantees.
📝 Abstract
We present a novel framework for online learning in Stackelberg general-sum games, where two agents, the leader and follower, engage in sequential turn-based interactions. At the core of this approach is a learned diffeomorphism that maps the joint action space to a smooth Riemannian manifold, referred to as the Stackelberg manifold. This mapping, facilitated by neural normalizing flows, ensures the formation of tractable isoplanar subspaces, enabling efficient techniques for online learning. By assuming linearity between the agents' reward functions on the Stackelberg manifold, our construct allows the application of standard bandit algorithms. We then provide a rigorous theoretical basis for regret minimization on convex manifolds and establish finite-time bounds on simple regret for learning Stackelberg equilibria. This integration of manifold learning into game theory uncovers a previously unrecognized potential for neural normalizing flows as an effective tool for multi-agent learning. We present empirical results demonstrating the effectiveness of our approach compared to standard baselines, with applications spanning domains such as cybersecurity and economic supply chain optimization.