🤖 AI Summary
This work reveals a critical long-term security vulnerability in large language model (LLM) agents equipped with persistent memory: adversaries can poison external contexts—such as documents or web pages—to implant malicious content that the agent later ingests as false memories and subsequently executes. We introduce the "dormant memory poisoning" attack paradigm, demonstrating for the first time that persistent memory mechanisms constitute a covert and delayed attack surface. To systematically evaluate this threat, we develop an end-to-end assessment pipeline and test leading stateful LLM agents. Experiments show poisoning success rates of 99.8% on GPT-5.5 and 95% on Kimi-K2.6, with attack trigger rates ranging from 60% to 89%, confirming the high effectiveness and generalizability of the proposed attack.
📝 Abstract
Large language models are increasingly augmented with persistent memory, allowing assistants to store user-specific information across sessions for personalization and continuity. This statefulness introduces a new security risk: adversarial content can corrupt what an assistant remembers and thereby influence future interactions. We propose and study sleeper memory poisoning, a delayed attack in which an adversary manipulates external context, such as a document, webpage, or repository, to cause the assistant to store a fabricated memory about the user. Unlike conventional prompt injection, the attack can remain dormant and re-emerge across multiple later conversations. We evaluate the full attack pipeline: whether poisoned memories are written, later retrieved, and ultimately used to steer the following conversations. Across stateful LLM assistants, poisoned memories were added up to 99.8% on GPT-5.5 and 95% on Kimi-K2.6. Crucially, among successful retrievals, poisoned memories cause attacker-intended agentic actions in 60-89% of evaluations across models. These results show that persistent memory can act as a long-term attack surface across multiple future conversations.