From Lossy to Verified: A Provenance-Aware Tiered Memory for Agents

📅 2026-02-20

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This work addresses the challenge faced by long-context agents in compressing interaction histories without losing critical information—such as allergy records—and without providing verifiable provenance, while using full raw logs incurs prohibitive computational overhead. To this end, the authors propose TierMem, a two-tier memory framework that responds to queries primarily via efficient summaries and, only when evidential support is insufficient, dynamically routes to immutable raw logs through a runtime mechanism. Verified results are then rewritten as new summaries enriched with traceable provenance links. This approach uniquely frames retrieval as an evidence allocation problem during inference, enabling on-demand access to the most cost-effective yet sufficient evidence tier. Evaluated on the LoCoMo benchmark, TierMem achieves an accuracy of 0.851—merely 0.022 below the full-log baseline—while reducing input tokens by 54.1% and latency by 60.7%.

Technology Category

Application Category

📝 Abstract

Long-horizon agents often compress interaction histories into write-time summaries. This creates a fundamental write-before-query barrier: compression decisions are made before the system knows what a future query will hinge on. As a result, summaries can cause unverifiable omissions -- decisive constraints (e.g., allergies) may be dropped, leaving the agent unable to justify an answer with traceable evidence. Retaining raw logs restores an authoritative source of truth, but grounding on raw logs by default is expensive: many queries are answerable from summaries, yet raw grounding still requires processing far longer contexts, inflating token consumption and latency. We propose TierMem, a provenance-linked framework that casts retrieval as an inference-time evidence allocation problem. TierMem uses a two-tier memory hierarchy to answer with the cheapest sufficient evidence: it queries a fast summary index by default, and a runtime sufficiency router Escalates to an immutable raw-log store only when summary evidence is insufficient. TierMem then writes back verified findings as new summary units linked to their raw sources. On LoCoMo, TierMem achieves 0.851 accuracy (vs.0.873 raw-only) while reducing input tokens by 54.1\% and latency by 60.7%.

Problem

Research questions and friction points this paper is trying to address.

long-horizon agents

provenance

memory tiering

verifiable reasoning

evidence omission

Innovation

Methods, ideas, or system contributions that make the work stand out.

Tiered Memory

Provenance-Aware

Evidence Allocation