π€ AI Summary
This work addresses the challenge in long-context dialogue systems where retrieved memory segments are typically unstructured text, lacking the relational, temporal, and thematic organization necessary for complex reasoning. To overcome this limitation, the authors propose GRAVITYβa plug-and-play, model-agnostic structured memory module that injects query-relevant structural anchors during generation, including entity relation graphs, causal event chains, and cross-session topic summaries. GRAVITY achieves the first unified integration of multidimensional knowledge representations without requiring modifications to the underlying language model. Experimental results demonstrate consistent performance gains across benchmarks: on LongMemEval and LoCoMo, it improves LLM judgment accuracy by 7.5β10.1% on average, with gains as high as 12.2% over weaker baselines and 3.8β5.7% over stronger ones.
π Abstract
Long-horizon conversational agents rely on memory systems with increasingly sophisticated retrieval mechanisms. However, retrieved fragments are typically fed to the language model as unstructured text, lacking the relational, temporal, and thematic structures essential for complex reasoning. To bridge this reasoning gap, we introduce GRAVITY (\textbf{G}eneration-time \textbf{R}elational \textbf{A}nchoring \textbf{V}ia \textbf{I}njected \textbf{T}opological Memor\textbf{Y}), a plug-and-play structured memory module. GRAVITY extracts three complementary knowledge representations from raw conversational utterances: entity profiles grounded in relational graphs, temporal event tuples linked into causal traces, and cross-session topic summaries. At generation time, it injects these representations into the host system's prompt as structured anchoring contexts. This approach effectively synthesizes scattered evidence into a coherent, query-relevant context without requiring any architectural modifications to the host model. Extensive evaluations across five diverse memory systems on the LongMemEval and LoCoMo benchmarks demonstrate the efficacy of our approach. On average, GRAVITY improves LLM-judge accuracy by 7.5--10.1%. Gains are inversely correlated with baseline strength: the weakest host improves by 12.2% while the strongest still gains 3.8--5.7%. These findings establish structured context anchoring as a broadly effective, architecture-agnostic augmentation paradigm for long-horizon conversational memory.