🤖 AI Summary
Current memory mechanisms in large language model agents lack systematic comparison and a unified framework, hindering their ability to support long-horizon, complex tasks. This work proposes the first unified architecture encompassing mainstream memory approaches and conducts a comprehensive evaluation and ablation study under standardized benchmarks and consistent experimental conditions, revealing the strengths, weaknesses, and suitable application scenarios of each method. Building on these insights, we design a modular memory mechanism that combines optimized strategies to significantly outperform existing state-of-the-art methods. The effectiveness of our approach is validated on two established benchmarks, offering a promising new direction for future research on agent memory systems.
📝 Abstract
Memory emerges as the core module in the large language model (LLM)-based agents for long-horizon complex tasks (e.g., multi-turn dialogue, game playing, scientific discovery), where memory can enable knowledge accumulation, iterative reasoning and self-evolution. A number of memory methods have been proposed in the literature. However, these methods have not been systematically and comprehensively compared under the same experimental settings. In this paper, we first summarize a unified framework that incorporates all the existing agent memory methods from a high-level perspective. We then extensively compare representative agent memory methods on two well-known benchmarks and examine the effectiveness of all methods, providing a thorough analysis of those methods. As a byproduct of our experimental analysis, we also design a new memory method by exploiting modules in the existing methods, which outperforms the state-of-the-art methods. Finally, based on these findings, we offer promising future research opportunities. We believe that a deeper understanding of the behavior of existing methods can provide valuable new insights for future research.