Benchmarking the Robustness of Agentic Systems to Adversarially-Induced Harms

📅 2025-08-22

📈 Citations: 0

✨ Influential: 0

career value

233K/year

🤖 AI Summary

This study investigates the security vulnerabilities of multi-agent systems (MAS) under adversarial attacks, addressing the absence of standardized evaluation frameworks for agent-level safety. Method: We propose a novel taxonomy of harmful behaviors and introduce BAD-ACTS—the first dedicated safety benchmark for MAS—comprising four representative environments and 188 high-quality adversarial behavior instances. Experiments demonstrate that compromising a single agent suffices to induce malicious system-wide behavior with high success rates; conventional prompt-engineering defenses prove ineffective, whereas cross-agent message monitoring significantly enhances robustness. Contribution/Results: Our work uncovers the cascade-failure mechanism triggered by intra-system adversarial interactions, establishes a reproducible and scalable security evaluation paradigm, and provides both theoretical foundations and practical tools for the trustworthy deployment of LLM-based agent systems.

Technology Category

Application Category

📝 Abstract

Ensuring the safe use of agentic systems requires a thorough understanding of the range of malicious behaviors these systems may exhibit when under attack. In this paper, we evaluate the robustness of LLM-based agentic systems against attacks that aim to elicit harmful actions from agents. To this end, we propose a novel taxonomy of harms for agentic systems and a novel benchmark, BAD-ACTS, for studying the security of agentic systems with respect to a wide range of harmful actions. BAD-ACTS consists of 4 implementations of agentic systems in distinct application environments, as well as a dataset of 188 high-quality examples of harmful actions. This enables a comprehensive study of the robustness of agentic systems across a wide range of categories of harmful behaviors, available tools, and inter-agent communication structures. Using this benchmark, we analyze the robustness of agentic systems against an attacker that controls one of the agents in the system and aims to manipulate other agents to execute a harmful target action. Our results show that the attack has a high success rate, demonstrating that even a single adversarial agent within the system can have a significant impact on the security. This attack remains effective even when agents use a simple prompting-based defense strategy. However, we additionally propose a more effective defense based on message monitoring. We believe that this benchmark provides a diverse testbed for the security research of agentic systems. The benchmark can be found at github.com/JNoether/BAD-ACTS

Problem

Research questions and friction points this paper is trying to address.

Evaluating robustness of LLM-based agentic systems against adversarial attacks

Proposing a novel benchmark to study security across harmful actions

Analyzing how single adversarial agents manipulate systems into harmful behaviors

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposed novel taxonomy of harms for agentic systems

Introduced BAD-ACTS benchmark with diverse agent implementations

Developed message monitoring defense against adversarial manipulation

🔎 Similar Papers

No similar papers found.