🤖 AI Summary
This work reveals that large language model (LLM) agents can be covertly hijacked through contamination of their long-term memory during external tool invocation. The study introduces a novel attack strategy that indirectly steers agents toward attacker-specified tools by injecting only a few malicious memory entries disguised as technical facts or operational strategies, rather than directly tampering with tool metadata. Combining context-aware modeling with decision-induction techniques, the approach achieves up to 85.9% attack success rates across three benchmarks, ten agent backbones, and three memory implementations using just three injected records—substantially outperforming the strongest baseline by 25%. Notably, the attack remains effective even under typical defense mechanisms, exposing the memory module as a critical and previously underappreciated attack surface in LLM-based systems.
📝 Abstract
LLM-driven agents are capable of selecting external tools to complete users' tasks. However, attackers could compromise such process, steering agents toward inappropriate/wrong tools and enabling malicious actions. Most existing attacks primarily manipulate the tool metadata, which is easily detectable by auditing and may lose effectiveness as modern agents increasingly adopt memory modules to refine tool selection policies through accumulated experience. This paper proposes MemMorph, the first attack that bias tool selection by poisoning the agent's long-term memory. Rather than explicitly dictating the tool invocation decision, MemMorph injects a small number of crafted records that are disguised as technical facts, incident reports, and operational policies. These poisoned records reshape the agent's contextual perception and decision-making process, leading it to autonomously infer and select the tool preferred by the attacker. Experiments across 3 benchmarks, 10 agent backbones, and 3 memory-module implementations show that MemMorph achieves up to 85.9% attack success rate with only three injected records, outperforming the strongest baseline by up to 25% while retaining potency under 3 representative defenses. Our findings expose long-term memory as a critical and under-explored attack surface in tool-augmented agents, urging the development of memory-level integrity safeguards.