🤖 AI Summary
This work identifies memory management as a critical factor governing the long-term behavior of LLM-based agents, revealing that unbounded memory growth induces “experience mimicry”—a phenomenon wherein semantically similar inputs trigger highly convergent outputs, leading to error propagation and mismatched experience replay. To address this, we propose a selective add-and-remove memory management strategy, the first systematic approach to model and mitigate experience mimicry. Methodologically, we establish a quantitative behavioral analysis framework integrating controlled experimental design, similarity-driven memory retrieval evaluation, and stress testing under distributional shift and resource constraints. Evaluated across diverse multi-task benchmarks, our approach achieves an average absolute performance improvement of 10%. We publicly release all code and experimental protocols, providing both reproducible theoretical foundations and practical guidelines for designing robust long-term memory systems in LLM agents.
📝 Abstract
Memory is a critical component in large language model (LLM)-based agents, enabling them to store and retrieve past executions to improve task performance over time. In this paper, we conduct an empirical study on how memory management choices impact the LLM agents' behavior, especially their long-term performance. Specifically, we focus on two fundamental memory operations that are widely used by many agent frameworks-addition, which incorporates new experiences into the memory base, and deletion, which selectively removes past experiences-to systematically study their impact on the agent behavior. Through our quantitative analysis, we find that LLM agents display an experience-following property: high similarity between a task input and the input in a retrieved memory record often results in highly similar agent outputs. Our analysis further reveals two significant challenges associated with this property: error propagation, where inaccuracies in past experiences compound and degrade future performance, and misaligned experience replay, where outdated or irrelevant experiences negatively influence current tasks. Through controlled experiments, we show that combining selective addition and deletion strategies can help mitigate these negative effects, yielding an average absolute performance gain of 10% compared to naive memory growth. Furthermore, we highlight how memory management choices affect agents' behavior under challenging conditions such as task distribution shifts and constrained memory resources. Our findings offer insights into the behavioral dynamics of LLM agent memory systems and provide practical guidance for designing memory components that support robust, long-term agent performance. We also release our code to facilitate further study.