Policy Gradient with Self-Attention for Model-Free Distributed Nonlinear Multi-Agent Games

📅 2025-09-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Dynamic nonlinear multi-agent games pose significant challenges due to time-varying interaction topologies and nonstationary Nash equilibria. Method: This paper proposes a model-free distributed policy gradient method that learns directly from observed state transitions and cost data—without requiring prior knowledge of the environment dynamics. It introduces a self-attention mechanism to parameterize nonlinear feedback policies, marking the first application of such mechanisms to multi-agent game-theoretic policy design under adaptive communication topologies and multi-team architectures. Policy co-optimization is achieved via end-to-end distributed training. Contribution/Results: The method demonstrates substantial performance gains over baseline approaches in nonlinear regulation tasks and in both simulated and real-world multi-robot pursuit–evasion games, validating its effectiveness, robustness to topology changes and environmental nonstationarity, and generalizability across heterogeneous agent configurations.

Technology Category

Application Category

📝 Abstract
Multi-agent games in dynamic nonlinear settings are challenging due to the time-varying interactions among the agents and the non-stationarity of the (potential) Nash equilibria. In this paper we consider model-free games, where agent transitions and costs are observed without knowledge of the transition and cost functions that generate them. We propose a policy gradient approach to learn distributed policies that follow the communication structure in multi-team games, with multiple agents per team. Our formulation is inspired by the structure of distributed policies in linear quadratic games, which take the form of time-varying linear feedback gains. In the nonlinear case, we model the policies as nonlinear feedback gains, parameterized by self-attention layers to account for the time-varying multi-agent communication topology. We demonstrate that our distributed policy gradient approach achieves strong performance in several settings, including distributed linear and nonlinear regulation, and simulated and real multi-robot pursuit-and-evasion games.
Problem

Research questions and friction points this paper is trying to address.

Learning distributed policies for model-free multi-agent games
Addressing time-varying interactions and non-stationary Nash equilibria
Using self-attention to handle nonlinear multi-agent communication topologies
Innovation

Methods, ideas, or system contributions that make the work stand out.

Policy gradient approach for distributed nonlinear multi-agent games
Self-attention layers model time-varying communication topology
Nonlinear feedback gains parameterized by attention mechanisms
🔎 Similar Papers
No similar papers found.
Eduardo Sebastián
Eduardo Sebastián
University of Cambridge
RoboticsNetworked SystemsControlLearning
M
Maitrayee Keskar
Department of Electrical and Computer Engineering, University of California San Diego, USA
E
Eeman Iqbal
Department of Electrical and Computer Engineering, University of California San Diego, USA
Eduardo Montijano
Eduardo Montijano
Universidad de Zaragoza, Spain
Roboticscomputer visiondistributed systems
C
Carlos Sagüés
RoPeRt group, at DIIS - I3A, Universidad de Zaragoza, Spain
N
Nikolay Atanasov
Department of Electrical and Computer Engineering, University of California San Diego, USA