Exposing Weak Links in Multi-Agent Systems under Adversarial Prompting

📅 2025-11-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing research predominantly focuses on single-agent safety evaluation, lacking a unified safety assessment framework for multi-agent systems (MAS), particularly neglecting MAS-specific failure modes and architecture-level vulnerabilities. Method: We propose SafeAgents—the first fine-grained safety evaluation framework specifically designed for MAS—and introduce the Dharma diagnostic metric, enabling precise identification of weak links in multi-agent collaborative workflows. Our evaluation employs adversarial prompting, behavioral tracing, and cross-architectural validation across centralized, decentralized, and hybrid paradigms. Contribution/Results: Systematic experiments across five mainstream MAS architectures and four task-oriented datasets reveal that common design patterns—such as atomic instruction delegation—significantly degrade robustness. These findings provide empirical grounding and actionable evaluation criteria for safety-aware MAS architecture design.

Technology Category

Application Category

📝 Abstract
LLM-based agents are increasingly deployed in multi-agent systems (MAS). As these systems move toward real-world applications, their security becomes paramount. Existing research largely evaluates single-agent security, leaving a critical gap in understanding the vulnerabilities introduced by multi-agent design. However, existing systems fall short due to lack of unified frameworks and metrics focusing on unique rejection modes in MAS. We present SafeAgents, a unified and extensible framework for fine-grained security assessment of MAS. SafeAgents systematically exposes how design choices such as plan construction strategies, inter-agent context sharing, and fallback behaviors affect susceptibility to adversarial prompting. We introduce Dharma, a diagnostic measure that helps identify weak links within multi-agent pipelines. Using SafeAgents, we conduct a comprehensive study across five widely adopted multi-agent architectures (centralized, decentralized, and hybrid variants) on four datasets spanning web tasks, tool use, and code generation. Our findings reveal that common design patterns carry significant vulnerabilities. For example, centralized systems that delegate only atomic instructions to sub-agents obscure harmful objectives, reducing robustness. Our results highlight the need for security-aware design in MAS. Link to code is https://github.com/microsoft/SafeAgents
Problem

Research questions and friction points this paper is trying to address.

Exposing vulnerabilities in multi-agent systems under adversarial prompting attacks
Addressing the lack of unified security frameworks for multi-agent interactions
Identifying how design choices impact susceptibility to adversarial manipulation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified framework for multi-agent security assessment
Diagnostic measure identifies weak links in pipelines
Systematic analysis of design choices on vulnerability
🔎 Similar Papers
No similar papers found.