ARAC: Adaptive Regularized Multi-Agent Soft Actor-Critic in Graph-Structured Adversarial Games

📅 2025-11-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In graph-structured multi-agent adversarial tasks—such as pursuit-evasion and combat—sparse rewards and complex dynamic interactions severely hinder efficient policy learning. To address this, we propose Adaptive Regularized Multi-Agent Soft Actor-Critic (AR-MASAC). Methodologically, AR-MASAC employs an attention-based graph neural network (GNN) to model time-varying topological dependencies among agents and introduces an adaptive divergence regularization mechanism: a reference policy guides exploration early in training, while the regularization strength is progressively attenuated as policy performance improves, thereby preventing premature convergence to suboptimal solutions. Experiments demonstrate that AR-MASAC achieves significantly faster convergence and higher task success rates in both pursuit and adversarial settings. Moreover, it exhibits strong scalability across varying numbers of agents and consistently outperforms state-of-the-art MARL baselines in overall performance.

Technology Category

Application Category

📝 Abstract
In graph-structured multi-agent reinforcement learning (MARL) adversarial tasks such as pursuit and confrontation, agents must coordinate under highly dynamic interactions, where sparse rewards hinder efficient policy learning. We propose Adaptive Regularized Multi-Agent Soft Actor-Critic (ARAC), which integrates an attention-based graph neural network (GNN) for modeling agent dependencies with an adaptive divergence regularization mechanism. The GNN enables expressive representation of spatial relations and state features in graph environments. Divergence regularization can serve as policy guidance to alleviate the sparse reward problem, but it may lead to suboptimal convergence when the reference policy itself is imperfect. The adaptive divergence regularization mechanism enables the framework to exploit reference policies for efficient exploration in the early stages, while gradually reducing reliance on them as training progresses to avoid inheriting their limitations. Experiments in pursuit and confrontation scenarios demonstrate that ARAC achieves faster convergence, higher final success rates, and stronger scalability across varying numbers of agents compared with MARL baselines, highlighting its effectiveness in complex graph-structured environments.
Problem

Research questions and friction points this paper is trying to address.

Addresses sparse rewards in graph-structured multi-agent adversarial games
Models agent dependencies using attention-based graph neural networks
Adaptively balances reference policy guidance to prevent suboptimal convergence
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses attention-based GNN for agent dependencies modeling
Adaptive divergence regularization for efficient exploration
Gradually reduces reliance on imperfect reference policies
🔎 Similar Papers
No similar papers found.
R
Ruochuan Shi
Institute of Automation, Chinese Academy of Sciences, School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
R
Runyu Lu
School of Artificial Intelligence, University of Chinese Academy of Sciences, Institute of Automation, Chinese Academy of Sciences, Beijing, China
Yuanheng Zhu
Yuanheng Zhu
Institute of Automation, Chinese Academy of Sciences
Dongbin Zhao
Dongbin Zhao
Institute of Automation, Chinese Academy of Sciences
Deep Reinforcement LearningAdaptive Dynamic ProgrammingGame AISmart drivingrobotics