Remembering More, Risking More: Longitudinal Safety Risks in Memory-Equipped LLM Agents

📅 2026-05-18

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

This work addresses a critical gap in existing evaluations by highlighting how memory accumulation in long-term, multi-task interactions introduces cumulative safety risks to subsequent unrelated tasks—a phenomenon termed temporal memory contamination. For the first time, memory safety is framed as a longitudinal property, and the authors propose a longitudinal safety evaluation framework based on a trigger-probe protocol and a NullMemory counterfactual baseline to quantify how safety degrades with increasing memory length on a fixed probe set. Through memory snapshot replay, high-recall diagnostic monitors, and extensive experiments across three deployment scenarios and eight memory architectures—including Claw-style agents—they demonstrate that violation rates rise significantly with memory exposure length. This trend stems primarily from the cumulative content of memory rather than its sequential order, and crucially, the risk can be effectively detected pre-generation via retrieval states.

📝 Abstract

Safety evaluations of memory-equipped LLM agents typically measure within-task safety: whether an agent completes a single scenario safely, often under adversarial conditions such as prompt injection or memory poisoning. In deployment, however, a single agent serves many independent tasks over a long horizon, and memory accumulated during earlier tasks can affect behavior on later, unrelated ones. Studying this regime requires evaluation along the temporal dimension across tasks: not whether an agent is safe at any single memory state, but how its safety profile changes as memory accumulates across many independent interactions. We call this failure mode temporal memory contamination. To isolate memory exposure from stream non-stationarity, we introduce a trigger-probe protocol that evaluates a fixed probe set against read-only memory snapshots at varying prefix lengths, together with a NullMemory counterfactual baseline for identifying memory-induced violations. We apply this protocol across three deployment scenarios spanning records, memos, forms, and email correspondence and eight memory architectures, and additionally on Claw-like AI agents, such as OpenClaw, using the platform's native memory mechanism. Memory-enabled agents consistently exceed the NullMemory baseline, and memory-induced violation rates show a robust upward trend with exposure length on both agent classes. Order-randomization experiments indicate that the effect is driven primarily by accumulated content rather than encounter order. Finally, a structural consequence of the event decomposition is that memory-induced risk is detectable from retrieval state before generation, which we confirm with a high-recall diagnostic monitor. Our results argue for treating memory safety as a longitudinal property that requires temporal evaluation, not a single-state property that can be captured by a snapshot.

Problem

Research questions and friction points this paper is trying to address.

temporal memory contamination

longitudinal safety

memory-equipped LLM agents

memory-induced violations

safety evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

temporal memory contamination

trigger-probe protocol

NullMemory baseline