Policy Gradient with Self-Attention for Model-Free Distributed Nonlinear Multi-Agent Games

📅 2025-09-22

📈 Citations: 0

✨ Influential: 0

career value

247K/year

🤖 AI Summary

Dynamic nonlinear multi-agent games pose significant challenges due to time-varying interaction topologies and nonstationary Nash equilibria. Method: This paper proposes a model-free distributed policy gradient method that learns directly from observed state transitions and cost data—without requiring prior knowledge of the environment dynamics. It introduces a self-attention mechanism to parameterize nonlinear feedback policies, marking the first application of such mechanisms to multi-agent game-theoretic policy design under adaptive communication topologies and multi-team architectures. Policy co-optimization is achieved via end-to-end distributed training. Contribution/Results: The method demonstrates substantial performance gains over baseline approaches in nonlinear regulation tasks and in both simulated and real-world multi-robot pursuit–evasion games, validating its effectiveness, robustness to topology changes and environmental nonstationarity, and generalizability across heterogeneous agent configurations.

Technology Category

Application Category

📝 Abstract

Multi-agent games in dynamic nonlinear settings are challenging due to the time-varying interactions among the agents and the non-stationarity of the (potential) Nash equilibria. In this paper we consider model-free games, where agent transitions and costs are observed without knowledge of the transition and cost functions that generate them. We propose a policy gradient approach to learn distributed policies that follow the communication structure in multi-team games, with multiple agents per team. Our formulation is inspired by the structure of distributed policies in linear quadratic games, which take the form of time-varying linear feedback gains. In the nonlinear case, we model the policies as nonlinear feedback gains, parameterized by self-attention layers to account for the time-varying multi-agent communication topology. We demonstrate that our distributed policy gradient approach achieves strong performance in several settings, including distributed linear and nonlinear regulation, and simulated and real multi-robot pursuit-and-evasion games.

Problem

Research questions and friction points this paper is trying to address.

Learning distributed policies for model-free multi-agent games

Addressing time-varying interactions and non-stationary Nash equilibria

Using self-attention to handle nonlinear multi-agent communication topologies

Innovation

Methods, ideas, or system contributions that make the work stand out.

Policy gradient approach for distributed nonlinear multi-agent games

Self-attention layers model time-varying communication topology

Nonlinear feedback gains parameterized by attention mechanisms

🔎 Similar Papers

Decentralized Transformers with Centralized Aggregation are Sample-Efficient Multi-Agent World Models