MemTrace: Tracing and Attributing Errors in Large Language Model Memory Systems

📅 2026-05-27

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

This work addresses the unreliability and poor debuggability of large language models’ memory systems in long-horizon reasoning, often caused by information loss or misaligned retrieval. The paper introduces the first error-tracing and attribution framework specifically designed for memory systems: it constructs an executable memory evolution graph to enable fine-grained, operation-level tracking of information flow; proposes an automated attribution algorithm coupled with a newly developed benchmark, MemTraceBench, to analyze memory failure modes; and leverages attribution signals to drive closed-loop prompt optimization. Empirical evaluation demonstrates that this approach improves end-to-end task performance by up to 7.62%, substantially enhancing both the reliability and interpretability of memory systems.

📝 Abstract

Memory is essential for enabling large language models to support long-horizon reasoning, yet existing memory systems remain unreliable and difficult to debug. Tracing memory's dynamic evolution is crucial to understand how information is synthesized, propagated, or corrupted over time. In this work, we study the new problem of error tracing and attribution in LLM memory systems. We propose a novel framework that transforms memory pipelines into executable memory evolution graphs, enabling fine-grained tracing of operational information flow. We then construct MemTraceBench, a benchmark collected from representative memory systems such as Long-Context, RAG, Mem0, and EverMemOS, to systematically study memory failure modes. We further introduce an automatic attribution method that iteratively traces operation subgraphs to pinpoint the root cause of any failed case. Our analysis reveals that memory failures are systematic, stemming from operation-level issues like information loss and retrieval misalignment. Crucially, we leverage these fine-grained attribution signals to guide downstream prompt optimization, establishing a closed-loop system that automatically corrects faults and boosts end-task performance by up to 7.62%. Code will be released at https://github.com/zjunlp/MemTrace.

Problem

Research questions and friction points this paper is trying to address.

error tracing

attribution

large language models

memory systems

memory failure

Innovation

Methods, ideas, or system contributions that make the work stand out.

memory tracing

error attribution

memory evolution graph