🤖 AI Summary
Autonomous web navigation agents rely on external memory systems to maintain cross-step contextual state, yet this design introduces critical security vulnerabilities: adversaries can tamper with memory contents to corrupt the agent’s internal task representation. This paper formally introduces “plan injection” attacks—malicious manipulation of the external memory layer to induce unauthorized agent behavior—and extends this to “contextual chain injection,” enabling multi-step semantic manipulation. Crucially, this attack paradigm bypasses existing prompt-injection defenses, revealing the memory layer—not the input layer—as the primary attack surface for autonomous agents. Red-team evaluations on Browser-use and Agent-E frameworks demonstrate that our attacks achieve a threefold increase in success rate over traditional prompt injection and improve privacy-exfiltration task success by 17.7%. These findings underscore the necessity of secure memory architectures as a foundational design principle for autonomous agent systems.
📝 Abstract
Autonomous web navigation agents, which translate natural language instructions into sequences of browser actions, are increasingly deployed for complex tasks across e-commerce, information retrieval, and content discovery. Due to the stateless nature of large language models (LLMs), these agents rely heavily on external memory systems to maintain context across interactions. Unlike centralized systems where context is securely stored server-side, agent memory is often managed client-side or by third-party applications, creating significant security vulnerabilities. This was recently exploited to attack production systems.
We introduce and formalize "plan injection," a novel context manipulation attack that corrupts these agents' internal task representations by targeting this vulnerable context. Through systematic evaluation of two popular web agents, Browser-use and Agent-E, we show that plan injections bypass robust prompt injection defenses, achieving up to 3x higher attack success rates than comparable prompt-based attacks. Furthermore, "context-chained injections," which craft logical bridges between legitimate user goals and attacker objectives, lead to a 17.7% increase in success rate for privacy exfiltration tasks. Our findings highlight that secure memory handling must be a first-class concern in agentic systems.