🤖 AI Summary
This work addresses the limited capacity of current language agents to recall and reason over interactive histories with contextual richness, as prevailing memory systems predominantly emphasize semantic memory while neglecting the temporal and spatial context of events and lack explicit modeling. To bridge this gap, the authors propose REMem, a novel framework that formally articulates the challenge of episodic memory in language agents for the first time. REMem introduces a two-stage architecture: offline, it constructs a hybrid memory graph integrating time-aware summaries and factual details; online, it employs a tool-augmented, intelligent retriever to perform iterative reasoning over this graph. Evaluated across four benchmarks, REMem significantly outperforms Mem0 and HippoRAG 2, achieving accuracy gains of 3.4% and 13.4% on episodic recall and reasoning tasks, respectively, while also demonstrating enhanced robustness in rejecting unanswerable queries.
📝 Abstract
Humans excel at remembering concrete experiences along spatiotemporal contexts and performing reasoning across those events, i.e., the capacity for episodic memory. In contrast, memory in language agents remains mainly semantic, and current agents are not yet capable of effectively recollecting and reasoning over interaction histories. We identify and formalize the core challenges of episodic recollection and reasoning from this gap, and observe that existing work often overlooks episodicity, lacks explicit event modeling, or overemphasizes simple retrieval rather than complex reasoning. We present REMem, a two-phase framework for constructing and reasoning with episodic memory: 1) Offline indexing, where REMem converts experiences into a hybrid memory graph that flexibly links time-aware gists and facts. 2) Online inference, where REMem employs an agentic retriever with carefully curated tools for iterative retrieval over the memory graph. Comprehensive evaluation across four episodic memory benchmarks shows that REMem substantially outperforms state-of-the-art memory systems such as Mem0 and HippoRAG 2, showing 3.4% and 13.4% absolute improvements on episodic recollection and reasoning tasks, respectively. Moreover, REMem also demonstrates more robust refusal behavior for unanswerable questions.