MemMorph: Tool Hijacking in LLM Agents via Memory Poisoning

📅 2026-05-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work reveals that large language model (LLM) agents can be covertly hijacked through contamination of their long-term memory during external tool invocation. The study introduces a novel attack strategy that indirectly steers agents toward attacker-specified tools by injecting only a few malicious memory entries disguised as technical facts or operational strategies, rather than directly tampering with tool metadata. Combining context-aware modeling with decision-induction techniques, the approach achieves up to 85.9% attack success rates across three benchmarks, ten agent backbones, and three memory implementations using just three injected records—substantially outperforming the strongest baseline by 25%. Notably, the attack remains effective even under typical defense mechanisms, exposing the memory module as a critical and previously underappreciated attack surface in LLM-based systems.
📝 Abstract
LLM-driven agents are capable of selecting external tools to complete users' tasks. However, attackers could compromise such process, steering agents toward inappropriate/wrong tools and enabling malicious actions. Most existing attacks primarily manipulate the tool metadata, which is easily detectable by auditing and may lose effectiveness as modern agents increasingly adopt memory modules to refine tool selection policies through accumulated experience. This paper proposes MemMorph, the first attack that bias tool selection by poisoning the agent's long-term memory. Rather than explicitly dictating the tool invocation decision, MemMorph injects a small number of crafted records that are disguised as technical facts, incident reports, and operational policies. These poisoned records reshape the agent's contextual perception and decision-making process, leading it to autonomously infer and select the tool preferred by the attacker. Experiments across 3 benchmarks, 10 agent backbones, and 3 memory-module implementations show that MemMorph achieves up to 85.9% attack success rate with only three injected records, outperforming the strongest baseline by up to 25% while retaining potency under 3 representative defenses. Our findings expose long-term memory as a critical and under-explored attack surface in tool-augmented agents, urging the development of memory-level integrity safeguards.
Problem

Research questions and friction points this paper is trying to address.

tool hijacking
memory poisoning
LLM agents
adversarial attack
long-term memory
Innovation

Methods, ideas, or system contributions that make the work stand out.

memory poisoning
tool hijacking
LLM agents
adversarial attack
long-term memory
🔎 Similar Papers
No similar papers found.
X
Xuanye Zhang
Nanyang Technological University, Singapore
Yongsen Zheng
Yongsen Zheng
Nanyang Technological University / Sun Yat-sen University
Recommender SystemHuman-AI Dialogue SystemNatural Language ProcessingTrustworthy AIAI Safety
Z
Zhuqin Xu
Nanyang Technological University, Singapore
K
Kaiyu Zhou
Nanyang Technological University, Singapore
B
Bowen Shen
Nanyang Technological University, Singapore
H
Haoran Ou
Nanyang Technological University, Singapore
Tianwei Zhang
Tianwei Zhang
Nanyang Technological University
Computer System Security
Kwok-Yan Lam
Kwok-Yan Lam
Nanyang Technological University
CybersecurityPrivacy-Preserving technologiesDigital TrustDistributing systemsLegalTech