🤖 AI Summary
Existing research predominantly focuses on single-agent safety evaluation, lacking a unified safety assessment framework for multi-agent systems (MAS), particularly neglecting MAS-specific failure modes and architecture-level vulnerabilities. Method: We propose SafeAgents—the first fine-grained safety evaluation framework specifically designed for MAS—and introduce the Dharma diagnostic metric, enabling precise identification of weak links in multi-agent collaborative workflows. Our evaluation employs adversarial prompting, behavioral tracing, and cross-architectural validation across centralized, decentralized, and hybrid paradigms. Contribution/Results: Systematic experiments across five mainstream MAS architectures and four task-oriented datasets reveal that common design patterns—such as atomic instruction delegation—significantly degrade robustness. These findings provide empirical grounding and actionable evaluation criteria for safety-aware MAS architecture design.
📝 Abstract
LLM-based agents are increasingly deployed in multi-agent systems (MAS). As these systems move toward real-world applications, their security becomes paramount. Existing research largely evaluates single-agent security, leaving a critical gap in understanding the vulnerabilities introduced by multi-agent design. However, existing systems fall short due to lack of unified frameworks and metrics focusing on unique rejection modes in MAS. We present SafeAgents, a unified and extensible framework for fine-grained security assessment of MAS. SafeAgents systematically exposes how design choices such as plan construction strategies, inter-agent context sharing, and fallback behaviors affect susceptibility to adversarial prompting. We introduce Dharma, a diagnostic measure that helps identify weak links within multi-agent pipelines. Using SafeAgents, we conduct a comprehensive study across five widely adopted multi-agent architectures (centralized, decentralized, and hybrid variants) on four datasets spanning web tasks, tool use, and code generation. Our findings reveal that common design patterns carry significant vulnerabilities. For example, centralized systems that delegate only atomic instructions to sub-agents obscure harmful objectives, reducing robustness. Our results highlight the need for security-aware design in MAS. Link to code is https://github.com/microsoft/SafeAgents