Empirical Study on Robustness and Resilience in Cooperative Multi-Agent Reinforcement Learning

📅 2025-10-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Collaborative multi-agent reinforcement learning (MARL) exhibits insufficient robustness and resilience during simulation-to-reality transfer. Method: We conduct over 80,000 experiments across four real-world robotic environments to systematically evaluate policy stability and disturbance rejection under multiple uncertainties—including action and observation noise—and perform extensive ablation studies. Contribution/Results: We reveal a strong nonlinear trade-off among cooperative performance, robustness, and resilience, and demonstrate that robustness lacks generalization across distinct perturbation types. Crucially, we discover that simple hyperparameter tuning—without architectural or algorithmic modifications—significantly enhances cooperation, robustness, and resilience across mainstream MARL algorithms (e.g., QMIX, MAPPO) and diverse robustification methods. This improvement is cross-algorithmic and cross-method generalizable, offering a lightweight, broadly applicable pathway to enhance the trustworthiness of MARL systems in real-world deployment.

Technology Category

Application Category

📝 Abstract
In cooperative Multi-Agent Reinforcement Learning (MARL), it is a common practice to tune hyperparameters in ideal simulated environments to maximize cooperative performance. However, policies tuned for cooperation often fail to maintain robustness and resilience under real-world uncertainties. Building trustworthy MARL systems requires a deep understanding of robustness, which ensures stability under uncertainties, and resilience, the ability to recover from disruptions--a concept extensively studied in control systems but largely overlooked in MARL. In this paper, we present a large-scale empirical study comprising over 82,620 experiments to evaluate cooperation, robustness, and resilience in MARL across 4 real-world environments, 13 uncertainty types, and 15 hyperparameters. Our key findings are: (1) Under mild uncertainty, optimizing cooperation improves robustness and resilience, but this link weakens as perturbations intensify. Robustness and resilience also varies by algorithm and uncertainty type. (2) Robustness and resilience do not generalize across uncertainty modalities or agent scopes: policies robust to action noise for all agents may fail under observation noise on a single agent. (3) Hyperparameter tuning is critical for trustworthy MARL: surprisingly, standard practices like parameter sharing, GAE, and PopArt can hurt robustness, while early stopping, high critic learning rates, and Leaky ReLU consistently help. By optimizing hyperparameters only, we observe substantial improvement in cooperation, robustness and resilience across all MARL backbones, with the phenomenon also generalizing to robust MARL methods across these backbones. Code and results available at https://github.com/BUAA-TrustworthyMARL/adv_marl_benchmark .
Problem

Research questions and friction points this paper is trying to address.

Evaluating robustness and resilience in cooperative multi-agent reinforcement learning under uncertainties
Analyzing how hyperparameter tuning affects trustworthiness across diverse real-world environments
Investigating generalization limitations of policies across different uncertainty types and agent scopes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale empirical study with 82,620 experiments
Evaluates robustness and resilience across uncertainty types
Optimizes hyperparameters to improve trustworthiness in MARL
🔎 Similar Papers
No similar papers found.
S
Simin Li
State Key Laboratory of Complex & Critical Software Environment, Beihang University, China
Z
Zihao Mao
State Key Laboratory of Complex & Critical Software Environment, Beihang University, China
H
Hanxiao Li
State Key Laboratory of Complex & Critical Software Environment, Beihang University, China
Zonglei Jing
Zonglei Jing
Beihang University
Machine LearningReinforcement LearningOptimal Control
Z
Zhuohang Bian
State Key Laboratory of Complex & Critical Software Environment, Beihang University, China
J
Jun Guo
State Key Laboratory of Complex & Critical Software Environment, Beihang University, China
L
Li Wang
State Key Laboratory of Complex & Critical Software Environment, Beihang University, China
Z
Zhuoran Han
State Key Laboratory of Complex & Critical Software Environment, Beihang University, China
R
Ruixiao Xu
State Key Laboratory of Complex & Critical Software Environment, Beihang University, China
X
Xin Yu
State Key Laboratory of Complex & Critical Software Environment, Beihang University, China
Chengdong Ma
Chengdong Ma
Peking University
Reinforcement LearningMulti-Agent Systems
Y
Yuqing Ma
State Key Laboratory of Complex & Critical Software Environment, Beihang University, China
B
Bo An
Nanyang Technological University, Singapore
Y
Yaodong Yang
Institute of Artificial Intelligence, Peking University, China
W
Weifeng Lv
State Key Laboratory of Complex & Critical Software Environment, Beihang University, China
X
Xianglong Liu
State Key Laboratory of Complex & Critical Software Environment, Beihang University, China, Zhongguancun Laboratory, China, Institute of data space, Hefei Comprehensive National Science Center, China