🤖 AI Summary
This work addresses the challenge of attributing memory provenance in long-term memory agents when logs, outputs, or trustworthy metadata are unavailable. The authors propose a snapshot-level watermarking mechanism that requires no external metadata, preserves memory utility, and resists lifecycle attacks. By employing a keyed, distribution-preserving sampling strategy, the method embeds controllable watermark signals during each LLM invocation and enables verifiable attribution through cryptographic commitments and signed session anchors. Experimental results on the LoCoMo benchmark demonstrate that the system maintains 99.6% F1 score and a +0.2% BLEU-1 performance gain while embedding an average of 1.14–1.26 bits of entropy. It successfully recovers a full 40-bit payload from snapshots alone, with false-key verification success rates approaching random chance.
📝 Abstract
Memory-backed agents need provenance that can survive leaked or migrated snapshots, where logs, visible outputs, and trusted metadata may be absent. We propose MemMark, a state-evolution attribution watermark that embeds an owner-controlled signal into latent memory-write decisions. At each internal LLM call, MemMark samples among admissible candidates using keyed, distribution-preserving selection, and records cryptographic commitments with signed session anchors and reveal evidence. This makes attribution depend on reproducible backend behavior rather than mutable provenance fields. Across A-Mem and Graphiti on LoCoMo, with three LLM backbones, MemMark preserves memory utility: Overall F1 retains 99.6% of the unwatermarked baseline, while BLEU-1 changes by +0.2%. It also provides usable carrier capacity, with 1.16, 1.14, and 1.26 bits of mean entropy for update-target, link-target, and semantic-realization decisions. In the snapshot-only R3 setting, MemMark recovers the full 40-bit payload from final snapshots, while wrong-key verification remains near chance. Under nine memory-lifecycle attacks, verification distinguishes tampering, evidence deletion, and partial payload recovery. These results show that robust snapshot-only attribution is feasible for long-term agent memory without surviving traces, trusted metadata, or utility-degrading.