🤖 AI Summary
Existing language agents struggle with long-term memory systems that can reconstruct complete evidence chains from partial cues, reuse structured graph roles, and self-optimize based on downstream feedback. This work proposes SAGE, a dynamic graph memory engine that incrementally constructs structured graph memories via a memory writer, retrieves information using a graph foundation model through a memory reader, and incorporates a self-evolving feedback mechanism to jointly refine memory structure. SAGE is the first to enable structure-aware, agent-feedback-driven dynamic evolution of graph memories, supporting role reuse and continuous updates. Experiments show it achieves state-of-the-art average ranking on multi-hop question answering, with zero-shot Recall@2/5 of 82.5/91.6 on the Natural Questions dataset, and significantly improves memory accuracy and hallucination diagnosis on LongMemEval and HaluMem benchmarks.
📝 Abstract
Long-term memory is becoming a central bottleneck for language agents. Exsting RAG and GraphRAG systems largely treat memory graphs as static retrieval middleware, which limits their ability to recover complete evidence chains from partial cues, exploit reusable graph-structrual roles, and improve the memory itself through downstream feedback. We introduce SAGE, a Self-evolving Agentic Graph-memory Engine that models graph memory as a dynamic long-term memory substrate. SAGE couples two roles: a memory writer that incrementally constucts structured graph memory from interaction histories, and a Graph Foundation Model-based memory reader to perform retrieval and provide feedback to the memory writer. We provide rigorooous theoretical annalyses supporting the framework. Across multi-hop QA, open-domain retireval, domain-specific review QA, and long-term agent-memory benchmarks, SAGE improves evidence recovery, answer grounding, and retrieval efficiency: after two self-evolution rounds, it achieves the best average rank on multi-hop QA; in zero-shot open-domain transfer, it reaches 82.5/91.6 Recall@2/5 on NQ. Further results on LongMemEval and HaluMem show that traning and reader-writer feedback improve multiple long-term memory and hallucination-diagnostic metrics, suggesting that self-evolving, structure-aware graph memory is a promising foundation for robust long-horizon language agents.