🤖 AI Summary
This work addresses the vulnerability of multi-agent large language model (LLM) systems during the planning phase to prompt-based manipulation, which enables malicious signals to propagate through task workflows. The study reveals, for the first time, the critical influence of workflow position and flattering linguistic cues on such malicious propagation. It introduces FlowSteer, a novel attack method that requires no modification to the system architecture and instead leverages carefully crafted prompts to steer malicious behavior, achieving up to a 55% increase in attack success rates across multiple systems. To counter this threat, the authors propose FlowGuard, an input-side defense mechanism that combines social influence detection with black-box topology inference to suppress malicious prompts while preserving the utility of legitimate ones, reducing attack success by up to 34% and offering a new perspective on securing multi-agent LLM planning.
📝 Abstract
Multi-agent systems (MAS) powered by large language models (LLMs) increasingly adopt planner--executor architectures, where planners convert prompts into subtasks, roles, dependencies, and routing paths. This flexibility enables adaptive coordination, but exposes an attack surface in workflow formation: prompts can shape agent organization without modifying MAS infrastructure. We study this risk through social influence probing workflows to identify high-impact subtasks and malicious-signal propagation. The analysis reveals two vulnerabilities: workflow position can amplify or suppress a malicious signal, and sycophantic framing makes downstream agents more likely to relay it. We translate these findings into FlowSteer, a prompt-only workflow steering attack that converts vulnerability priors into one crafted prompt. FlowSteer aligns a malicious signal with influential task components and guides replanning toward dependencies that preserve propagation. Experiments show that FlowSteer increases malicious success by up to 55% over naive prompting, transfers across MAS setups, and remains effective with black-box topology inference. As FlowSteer biases the planning signals that generate the workflow, MAS defenses that inspect only the generated workflow provide limited protection. As such, we introduce FlowGuard, an input-side defense that reduces malicious success by up to 34% while preserving prompt utility. Our results position workflow formation as a new safety frontier for multi-agent LLM systems, opening a planning-time security perspective on how agent coordination itself can be attacked and defended.