🤖 AI Summary
This work addresses the vulnerability of large language model (LLM) agents to multi-round adversarial attacks in prolonged, complex interactions—a threat inadequately mitigated by existing single-turn defense mechanisms. We present the first systematic formulation of “long-term attacks” and introduce a comprehensive security evaluation benchmark tailored to this threat model, encompassing five novel attack categories: intent hijacking, toolchain abuse, task injection, and others. The benchmark includes 644 test cases across 28 realistic environments. To enable scalable assessment, we develop an adversarial testing platform that simulates multi-turn user–agent–environment interactions and integrates automated attack generation and evaluation. Empirical results demonstrate that state-of-the-art LLM agents are highly susceptible to these long-term attacks, and conventional defense strategies largely fail, underscoring the urgent need for new security paradigms.
📝 Abstract
LLM agents are increasingly deployed in long-horizon, complex environments to solve challenging problems, but this expansion exposes them to long-horizon attacks that exploit multi-turn user-agent-environment interactions to achieve objectives infeasible in single-turn settings. To measure agent vulnerabilities to such risks, we present AgentLAB, the first benchmark dedicated to evaluating LLM agent susceptibility to adaptive, long-horizon attacks. Currently, AgentLAB supports five novel attack types including intent hijacking, tool chaining, task injection, objective drifting, and memory poisoning, spanning 28 realistic agentic environments, and 644 security test cases. Leveraging AgentLAB, we evaluate representative LLM agents and find that they remain highly susceptible to long-horizon attacks; moreover, defenses designed for single-turn interactions fail to reliably mitigate long-horizon threats. We anticipate that AgentLAB will serve as a valuable benchmark for tracking progress on securing LLM agents in practical settings. The benchmark is publicly available at https://tanqiujiang.github.io/AgentLAB_main.