Rethinking Priority Scheduling for Sequential Multi-Agent Decision Making in Stackelberg Games

📅 2026-05-08

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This work addresses the underexplored impact of decision ordering on equilibrium outcomes in N-level Stackelberg games, where the default hierarchical sequence is not necessarily optimal. The authors propose Hierarchical Priority Adjustment (HPA), a method that jointly optimizes both decision order and policies through a learnable dynamic scheduling mechanism. In this framework, upper-level agents select an optimal sequence based on the current state, while lower-level agents execute actions sequentially within a spatiotemporal Markov game. Coordination across multiple timescales is achieved via a fast-slow update scheme and a shared intrinsic reward derived from advantage functions. Theoretical analysis provides the first characterization of conditions under which decision ordering influences Stackelberg equilibria. Experiments demonstrate that HPA significantly outperforms existing baselines on high-dimensional multi-agent MuJoCo control tasks and exhibits strong adaptability across diverse environments.

📝 Abstract

Current research applying N-level Stackelberg Game to multi-agent systems often uses the default decision order of agents provided by the environment. However, this raises the question: does the order of agents necessarily affect the final equilibrium point of the game? To address this, we formally analyze the N-level Stackelberg Game, where changing the order in which agents make decisions typically leads to an overdetermined system. As a result, the equilibrium point shifts unless special structural conditions are satisfied. Based on this analysis, we propose the Hierarchical Priority Adjustment (HPA) method, which adjusts and selects the agents' decision order. At the upper level, an upper policy dynamically selects the optimal decision order of agents based on the current game state. At the lower level, agents execute strategies in the Spatio-Temporal Sequential Markov Game (STMG) according to the selected order. To coordinate learning across time scales, we employ a slow-fast update scheme with shared intrinsic rewards derived from the advantage function of the upper policy. Experimental results on high-precision control tasks, including multi-agent MuJoCo, show that HPA outperforms benchmark algorithms and robustly adapts to changing environments. These results highlight the crucial role of optimizing the agents' decision order in N-level Stackelberg Game.

Problem

Research questions and friction points this paper is trying to address.

Stackelberg Game

multi-agent systems

decision order

equilibrium point

priority scheduling

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical Priority Adjustment

Stackelberg Game

Decision Order Optimization

Spatio-Temporal Sequential Markov Game

Multi-Agent Reinforcement Learning

🔎 Similar Papers

No similar papers found.

Bosch Group

Renningen, BW, DE

AI Research Scientist - FAIR Social Intelligence