🤖 AI Summary
This work addresses the underexplored impact of decision ordering on equilibrium outcomes in N-level Stackelberg games, where the default hierarchical sequence is not necessarily optimal. The authors propose Hierarchical Priority Adjustment (HPA), a method that jointly optimizes both decision order and policies through a learnable dynamic scheduling mechanism. In this framework, upper-level agents select an optimal sequence based on the current state, while lower-level agents execute actions sequentially within a spatiotemporal Markov game. Coordination across multiple timescales is achieved via a fast-slow update scheme and a shared intrinsic reward derived from advantage functions. Theoretical analysis provides the first characterization of conditions under which decision ordering influences Stackelberg equilibria. Experiments demonstrate that HPA significantly outperforms existing baselines on high-dimensional multi-agent MuJoCo control tasks and exhibits strong adaptability across diverse environments.
📝 Abstract
Current research applying N-level Stackelberg Game to multi-agent systems often uses the default decision order of agents provided by the environment. However, this raises the question: does the order of agents necessarily affect the final equilibrium point of the game? To address this, we formally analyze the N-level Stackelberg Game, where changing the order in which agents make decisions typically leads to an overdetermined system. As a result, the equilibrium point shifts unless special structural conditions are satisfied. Based on this analysis, we propose the Hierarchical Priority Adjustment (HPA) method, which adjusts and selects the agents' decision order. At the upper level, an upper policy dynamically selects the optimal decision order of agents based on the current game state. At the lower level, agents execute strategies in the Spatio-Temporal Sequential Markov Game (STMG) according to the selected order. To coordinate learning across time scales, we employ a slow-fast update scheme with shared intrinsic rewards derived from the advantage function of the upper policy. Experimental results on high-precision control tasks, including multi-agent MuJoCo, show that HPA outperforms benchmark algorithms and robustly adapts to changing environments. These results highlight the crucial role of optimizing the agents' decision order in N-level Stackelberg Game.