🤖 AI Summary
This work addresses reasoning errors in large language models caused by the entanglement of heterogeneous memory types—such as facts, events, and behavioral rules—in a shared representational space. To mitigate this issue, the authors propose a type-aware memory framework that assigns explicit functional roles to memories at write time, enabling type-isolated storage alongside cross-type relational modeling. During inference, the framework selectively fuses only those memory traces deemed necessary for the current reasoning task, introducing for the first time a functional-type constraint mechanism to govern both memory writing and retrieval. Experimental results demonstrate that this approach improves memory reliability by up to 28.27% on hallucination and long-context dialogue benchmarks, while reducing the number of retrieved tokens to approximately one-fifth (1/5.8) of those required by prior methods.
📝 Abstract
Memory-augmented large language models extend reasoning beyond a fixed context window by maintaining long-term memory across interactions. However, existing memory systems often collapse stable user facts, episodic events, and behavioral rules into a shared space, allowing functionally distinct memories to be retrieved and used as interchangeable evidence. We identify this failure mode as heterogeneous memory contamination, where context-specific events become overgeneralized claims, or semantically relevant but functionally incompatible memories mislead generation. To this end, we introduce MemGuard, a type-aware memory framework that preserves functional memory boundaries during memory construction and retrieval. It assigns each memory an explicit functional role at write time, maintains relations across type-isolated memories, and selectively composes evidence only from necessary memory types, reducing contamination from irrelevant or functionally incompatible evidence. Across hallucination and long-horizon conversation benchmarks, MemGuard improves memory reliability by up to 28.27% while retrieving up to 5.8x fewer memory tokens than prior methods. These results suggest that reliable long-term reasoning depends on principled organization and selective use of heterogeneous memory.