Constrained Black-Box Attacks Against Multi-Agent Reinforcement Learning

📅 2025-08-12

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This study uncovers a novel security vulnerability in collaborative multi-agent reinforcement learning (c-MARL) under realistic deployment settings: adversaries can mislead agent decisions solely by perturbing environmental observations—without observing or accessing agent policies. Method: We propose the first black-box observation-perturbation attack framework for c-MARL that requires no policy access and avoids surrogate models, integrating gradient estimation with policy-agnostic perturbation generation. Contribution/Results: Our method achieves unprecedented cross-algorithm and cross-environment transferability. Evaluated across 22 environments spanning three benchmark platforms, it attains effective attacks within just 1,000 environment interactions—improving sample efficiency by three orders of magnitude over prior approaches. Results demonstrate severe runtime fragility in deployed c-MARL systems, establishing a new perspective on multi-agent security and providing a practical evaluation tool for robustness assessment.

Technology Category

Application Category

📝 Abstract

Collaborative multi-agent reinforcement learning (c-MARL) has rapidly evolved, offering state-of-the-art algorithms for real-world applications, including sensitive domains. However, a key challenge to its widespread adoption is the lack of a thorough investigation into its vulnerabilities to adversarial attacks. Existing work predominantly focuses on training-time attacks or unrealistic scenarios, such as access to policy weights or the ability to train surrogate policies. In this paper, we investigate new vulnerabilities under more realistic and constrained conditions, assuming an adversary can only collect and perturb the observations of deployed agents. We also consider scenarios where the adversary has no access at all. We propose simple yet highly effective algorithms for generating adversarial perturbations designed to misalign how victim agents perceive their environment. Our approach is empirically validated on three benchmarks and 22 environments, demonstrating its effectiveness across diverse algorithms and environments. Furthermore, we show that our algorithm is sample-efficient, requiring only 1,000 samples compared to the millions needed by previous methods.

Problem

Research questions and friction points this paper is trying to address.

Investigating vulnerabilities of multi-agent reinforcement learning to adversarial attacks

Generating adversarial perturbations under constrained black-box conditions

Misaligning victim agents' environment perceptions with minimal samples

Innovation

Methods, ideas, or system contributions that make the work stand out.

Observation perturbation under constrained conditions

Sample-efficient adversarial algorithm generation

Black-box attacks without policy access

🔎 Similar Papers

No similar papers found.