MemAudit: Post-hoc Auditing of Poisoned Agent Memory via Causal Attribution and Structural Anomaly Detection

πŸ“… 2026-05-22
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

224K/year
πŸ€– AI Summary
This work addresses the vulnerability of memory-augmented large language model (LLM) agents to malicious users who can inject harmful content through seemingly benign interactions, thereby manipulating subsequent agent behavior. Existing defenses lack post-hoc traceability, leaving systems unable to identify compromised memories after the fact. To bridge this gap, the authors propose MemAudit, the first framework enabling post-hoc auditing of memory-augmented LLM agents. MemAudit leverages counterfactual causal analysis to compute memory influence scores and constructs a memory consistency graph for structural anomaly detection, allowing precise identification of toxic memories without requiring online intervention. Evaluated on question-answering and reasoning tasks, MemAudit reduces the success rate of MINJA attacks from 70% and 83.3% to 0%, substantially enhancing memory safety.
πŸ“ Abstract
Large language model agents increasingly rely on persistent memory to store past interactions, retrieve relevant demonstrations, and improve long-horizon task execution. However, this memory mechanism also creates a practical security vulnerability: an adversarial user may inject malicious records into the agent's memory through ordinary interaction, and these records can later be retrieved to steer the agent's reasoning and actions. Existing defenses primarily focus on online intervention, such as prompt filtering or output blocking, but they do not address the post-hoc question of which stored memories are responsible after harmful behavior has already been observed. We propose \textbf{MemAudit}, a post-hoc causal memory auditing framework for memory-augmented LLM agents. The framework combines two complementary signals: (1) a counterfactual memory influence score that measures each memory's causal contribution to harmful outputs, and (2) a memory consistency graph that identifies structurally anomalous memories within the broader memory store. We evaluate MemAudit against MINJA, a query-only memory injection attack in which malicious records are generated and stored through normal agent interactions rather than direct memory-bank modification. Across both QA and reasoning-agent settings, MemAudit substantially reduces attack success rates under realistic post-hoc auditing scenarios. The results show that QA attack success is reduced from $70\%$ to $0\%$, while RAP attack success drops from $83.3\%$ to $0\%$.
Problem

Research questions and friction points this paper is trying to address.

poisoned agent memory
post-hoc auditing
memory injection attack
causal attribution
structural anomaly detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

post-hoc auditing
causal attribution
memory poisoning
structural anomaly detection
LLM agents
πŸ”Ž Similar Papers
No similar papers found.
πŸ’Ό Related Jobs
Z
Zhewen Tan
Institute of Information Engineering, Chinese Academy of Sciences
Y
Yilun Yao
Qiyuan Tech
H
Huiyan Jin
Laboratory of Multimedia Information Processing, School of Computer Science, Peking University
W
Wenhan Yu
Qiyuan Tech
Guoan Wang
Guoan Wang
Stevens Institute of Technology
General Medical AI
M
Mengyuan Fan
Laboratory of Multimedia Information Processing, School of Computer Science, Peking University
L
liang lu
Institute of Information Engineering, Chinese Academy of Sciences
Feng Liu
Feng Liu
Institute of Information Engineering Chinese Academy of Sciences
visual cryptographysecurity protocolcybersecurity
Xiangzheng Zhang
Xiangzheng Zhang
360
AI safetyLarge language modelsInformation Retrieval
Duohe Ma
Duohe Ma
Associate Professor
Moving Target DefenseInformation SecurityNetwork SecurityCloud SecurityData Security
Tong Yang
Tong Yang
Peking University, Beijing, China. PKU. εŒ—δΊ¬ε€§ε­¦
SketchNetwork measurementBloom filterIP lookupHash Table
Lin Sun
Lin Sun
Qihoo 360
large language model