🤖 AI Summary
This work addresses the limitations of traditional Retrieval-Augmented Generation (RAG) in agent memory systems, where reliance on fixed similarity-based retrieval often introduces redundant context and post-hoc pruning may disrupt temporal dependencies critical for reasoning. To overcome these issues, the authors propose xMemory, a novel framework that introduces memory decoupling and hierarchical aggregation into agent memory for the first time. xMemory constructs a semantic hierarchy from high-level topics down to raw messages and employs a top-down retrieval strategy to enhance both response diversity and reasoning completeness. Guided by a sparsity–semantics objective, the framework dynamically splits and merges memory units, moving beyond conventional similarity-matching paradigms. Experiments demonstrate that xMemory, when integrated with mainstream large language models, significantly improves answer quality and token efficiency on the LoCoMo and PerLTQA benchmarks.
📝 Abstract
Agent memory systems often adopt the standard Retrieval-Augmented Generation (RAG) pipeline, yet its underlying assumptions differ in this setting. RAG targets large, heterogeneous corpora where retrieved passages are diverse, whereas agent memory is a bounded, coherent dialogue stream with highly correlated spans that are often duplicates. Under this shift, fixed top-$k$ similarity retrieval tends to return redundant context, and post-hoc pruning can delete temporally linked prerequisites needed for correct reasoning. We argue retrieval should move beyond similarity matching and instead operate over latent components, following decoupling to aggregation: disentangle memories into semantic components, organise them into a hierarchy, and use this structure to drive retrieval. We propose xMemory, which builds a hierarchy of intact units and maintains a searchable yet faithful high-level node organisation via a sparsity--semantics objective that guides memory split and merge. At inference, xMemory retrieves top-down, selecting a compact, diverse set of themes and semantics for multi-fact queries, and expanding to episodes and raw messages only when it reduces the reader's uncertainty. Experiments on LoCoMo and PerLTQA across the three latest LLMs show consistent gains in answer quality and token efficiency.