π€ AI Summary
Existing long-term large language model agents often suffer from source-role confusion and source-monitoring errors due to their reliance on unstructured textual memory representations. This work proposes MemIRβa typed memory intermediate representation that decomposes long-term memory into three atomic components: evidence, retrieval cues, and factual claims. By employing multi-path atomic projection and source-constraining mechanisms, MemIR constructs claim-centered candidate bundles and standardized factual interfaces. Notably, it is the first approach to model source monitoring as a structural constraint at the architectural level, explicitly separating informational roles through typed memory atoms to effectively prevent source confusion. Experimental results demonstrate that MemIR significantly outperforms current methods on the LoCoMo and BEAM-100K benchmarks, particularly excelling in tasks requiring source tracing, temporal localization, and integration of fragmented evidence.
π Abstract
Long-term memory is essential for persistent LLM agents, yet prevailing architectures store historical interactions as unstructured, flat text. This unconstrained storage induces provenance-role collapse, a critical failure mode where agents suffer from source-monitoring errors. To resolve this cognitive vulnerability at the architectural level, we propose MemIR, a typed Memory Intermediate Representation that operationalizes source monitoring as a structural constraint. MemIR writes long-term memory into grounded atoms that separate raw evidence, retrieval cues, and truth-bearing claims, with factual authorization restricted to supported claim atoms. It then applies multi-route atomic projection and provenance-scoped utilization to transform heterogeneous retrieval hits into claim-centered candidate bundles and a normalized fact interface for answer generation. Experiments on LoCoMo and BEAM-100K demonstrate that MemIR consistently outperforms existing memory baselines, especially on tasks requiring source tracking, temporal grounding, and aggregation of fragmented evidence.