🤖 AI Summary
This work addresses the limitations of existing agent safety research, which predominantly focuses on single-round, stateless interactions and fails to capture sophisticated, multi-dimensional evasion attacks in dynamic, multi-turn settings. We propose the first evasion attack framework tailored for large language model–driven agents, formally defining and implementing three attack vectors—temporal, spatial, and semantic—and introducing A3S-Bench, a benchmark comprising 2,254 real-world execution trajectories. By leveraging multi-turn task trajectory modeling, external artifact embedding, and contextual noise injection, our approach elevates the average risk trigger rate from 28.3% to 52.6% across 20 realistic threat scenarios. This reveals systemic security vulnerabilities in current agent architectures during extended interactions and transcends the constraints of conventional single-round analysis paradigms.
📝 Abstract
As autonomous agents (e.g., OpenClaw) increasingly operate with deep system-level privileges to execute complex tasks, they introduce severe, unmitigated security risks. Current vulnerability analyses overwhelmingly focus on single-turn, stateless behaviors, overlooking the expanded attack surface inherent in stateful, multi-turn interactions and dynamic tool invocations. In this paper, we propose a novel, multi-dimensional evasion framework targeting LLM-based agent systems. We introduce three stealthy attack vectors: (1) Temporal evasion, which fragments malicious payloads across sequential interaction turns; (2) Spatial evasion, which conceals payloads within complex external artifacts that evade standard LLM parsing mechanisms; and (3) Semantic evasion, which obscures malicious intents beneath benign contextual noise. To systematically quantify these threats, we construct A3S-Bench, a comprehensive benchmark comprising 2,254 real-world agent execution trajectories. Evaluating a standard agent framework separately integrated with 10 mainstream LLM backbones against 20 practical threat scenarios, we demonstrate that our evasion framework elevates the average risk trigger rate from a 28.3\% baseline to 52.6\%. These findings reveal systemic, architecture-level vulnerabilities in current autonomous agent systems that existing defenses fail to address, highlighting an urgent need for defense mechanisms tailored to the unique threats.