State Contamination in Memory-Augmented LLM Agents

📅 2026-05-15
📈 Citations: 0
Influential: 0
📄 PDF

career value

231K/year
🤖 AI Summary
This study addresses a critical safety concern in memory-augmented large language models: during memory compression, toxic or adversarial content may be “whitewashed,” evading detection while still exerting implicit influence on subsequent behavior. To quantify this latent toxicity, the work introduces the first formal measure—subliminal propagation gap (SPG)—and employs multi-agent counterfactual rollback experiments to reveal the pivotal role of memory compression in propagating harmful effects. The findings demonstrate that merely sanitizing memory summaries is insufficient to mitigate risk; instead, purification interventions must be applied prior to information compression. Experimental results confirm that this preemptive strategy significantly reduces the SPG, effectively disrupting the covert transmission of harmful influences.
📝 Abstract
LLM agents increasingly rely on persistent state, including transcripts, summaries, retrieved context, and memory buffers, to support long-horizon interaction. This makes safety depend not only on individual model outputs, but also on what an agent stores and later reuses. We study a failure mode we call memory laundering: toxic or adversarial context can be compressed into memory summaries that no longer appear toxic under standard detectors, while still preserving hostile framing or conflict structure that influences future generations. Using paired counterfactual multi-agent rollouts, we show that toxic-origin memory summaries can remain below common toxicity thresholds while nevertheless increasing downstream toxicity relative to matched neutral baselines. To measure this hidden influence, we introduce the sub-threshold propagation gap (SPG), which quantifies downstream behavioral differences conditioned on memory states that a deployed monitor would classify as safe. Our experiments show that toxicity propagates through distinct state channels: raw transcript reuse drives overt downstream toxicity, while compressed memory carries hidden sub-threshold influence. We further find that mitigation depends critically on intervention placement. Sanitizing toxic state before summarization substantially reduces the hidden propagation gap, whereas cleaning only the completed summary can leave laundered influence intact. These results suggest that safety in memory-augmented agents should be treated as a state-control problem over evolving context, with sanitization applied before unsafe information is compressed into persistent memory.
Problem

Research questions and friction points this paper is trying to address.

memory laundering
toxicity propagation
memory-augmented agents
state contamination
sub-threshold influence
Innovation

Methods, ideas, or system contributions that make the work stand out.

memory laundering
sub-threshold propagation gap
state contamination
memory-augmented agents
toxicity propagation
🔎 Similar Papers
Y
Yian Wang
Department of Computer Science, University of Illinois Urbana-Champaign
Agam Goyal
Agam Goyal
CS PhD Student, University of Illinois Urbana-Champaign
Natural Language ProcessingHuman-AI InteractionSocial ComputingComputational Social Science
Yuen Chen
Yuen Chen
University of Illinois at Urbana-Champaign
Machine LearningCausalityTrustworthy ML
H
Hari Sundaram
Department of Computer Science, University of Illinois Urbana-Champaign