🤖 AI Summary
This paper identifies a critical control-flow hijacking vulnerability in multi-agent systems when processing untrusted inputs (e.g., malicious web pages or email attachments): even if individual agents are robust against prompt injection and refuse harmful instructions, their inter-agent coordination mechanisms can be subverted to achieve arbitrary code execution and exfiltration of sensitive data within the containerized environment. Leveraging red-teaming on mainstream frameworks—including AutoGen and CrewAI—the authors conduct dynamic inter-agent communication monitoring and sandbox escape analysis to realize, for the first time in realistic settings, an end-to-end attack chain. Key contributions are: (1) identification and formalization of “control-flow hijacking” as a novel system-level attack surface; (2) empirical demonstration that agent-level security guarantees do not ensure system-level security; and (3) proposal of the first security evaluation paradigm specifically designed for multi-agent systems.
📝 Abstract
Multi-agent systems coordinate LLM-based agents to perform tasks on users' behalf. In real-world applications, multi-agent systems will inevitably interact with untrusted inputs, such as malicious Web content, files, email attachments, etc. Using several recently proposed multi-agent frameworks as concrete examples, we demonstrate that adversarial content can hijack control and communication within the system to invoke unsafe agents and functionalities. This results in a complete security breach, up to execution of arbitrary malicious code on the user's device and/or exfiltration of sensitive data from the user's containerized environment. We show that control-flow hijacking attacks succeed even if the individual agents are not susceptible to direct or indirect prompt injection, and even if they refuse to perform harmful actions.