🤖 AI Summary
Dynamic nonlinear multi-agent games pose significant challenges due to time-varying interaction topologies and nonstationary Nash equilibria. Method: This paper proposes a model-free distributed policy gradient method that learns directly from observed state transitions and cost data—without requiring prior knowledge of the environment dynamics. It introduces a self-attention mechanism to parameterize nonlinear feedback policies, marking the first application of such mechanisms to multi-agent game-theoretic policy design under adaptive communication topologies and multi-team architectures. Policy co-optimization is achieved via end-to-end distributed training. Contribution/Results: The method demonstrates substantial performance gains over baseline approaches in nonlinear regulation tasks and in both simulated and real-world multi-robot pursuit–evasion games, validating its effectiveness, robustness to topology changes and environmental nonstationarity, and generalizability across heterogeneous agent configurations.
📝 Abstract
Multi-agent games in dynamic nonlinear settings are challenging due to the time-varying interactions among the agents and the non-stationarity of the (potential) Nash equilibria. In this paper we consider model-free games, where agent transitions and costs are observed without knowledge of the transition and cost functions that generate them. We propose a policy gradient approach to learn distributed policies that follow the communication structure in multi-team games, with multiple agents per team. Our formulation is inspired by the structure of distributed policies in linear quadratic games, which take the form of time-varying linear feedback gains. In the nonlinear case, we model the policies as nonlinear feedback gains, parameterized by self-attention layers to account for the time-varying multi-agent communication topology. We demonstrate that our distributed policy gradient approach achieves strong performance in several settings, including distributed linear and nonlinear regulation, and simulated and real multi-robot pursuit-and-evasion games.