🤖 AI Summary
This work addresses the challenge of maintaining coherence and reasoning efficiency in long-term conversational agents, which is often hindered by fragmented memory representations. To overcome this, the authors propose a narrative-driven memory mechanism that organizes dialogue histories into structured episodic narratives through offline active inference. This approach integrates a momentum-based consolidation strategy to stabilize memory traces and transforms peripheral factual details into semantic memory. During retrieval, reasoning leverages the narrative structure to enhance coherence. The method transcends the limitations of conventional embedding- or schema-based memory systems, achieving significant performance gains on the LOCOMO benchmark: it outperforms existing approaches by markedly improving response speed (by 50%), attaining near full-context reasoning quality, and demonstrating superior memory coverage and response fidelity compared to embedding-based retrieval.
📝 Abstract
Long-term conversational agents face a fundamental scalability challenge as interactions extend over time: repeatedly processing entire conversation histories becomes computationally prohibitive. Current approaches attempt to solve this through memory frameworks that predominantly fragment conversations into isolated embeddings or graph representations and retrieve relevant ones in a RAG style. While computationally efficient, these methods often treat memory formation minimally and fail to capture the subtlety and coherence of human memory. We introduce Amory, a working memory framework that actively constructs structured memory representations through enhancing agentic reasoning during offline time. Amory organizes conversational fragments into episodic narratives, consolidates memories with momentum, and semanticizes peripheral facts into semantic memory. At retrieval time, the system employs coherence-driven reasoning over narrative structures. Evaluated on the LOCOMO benchmark for long-term reasoning, Amory achieves considerable improvements over previous state-of-the-art, with performance comparable to full context reasoning while reducing response time by 50%. Analysis shows that momentum-aware consolidation significantly enhances response quality, while coherence-driven retrieval provides superior memory coverage compared to embedding-based approaches.