π€ AI Summary
This work addresses the limitation of existing large language model (LLM) agents whose long-term memory systems employ fixed retrieval mechanisms after deployment, hindering adaptability across sessions. The authors propose a self-evolving memory architecture that leverages an LLM-driven diagnostic module to analyze failure logs and automatically optimize retrieval configurations. Integrated with rollback-based degradation prevention and stagnation-aware exploration mechanisms, the system enables closed-loop autonomous evolution. This approach achieves, for the first time, co-evolution of memory content and retrieval mechanisms, uncovering novel configuration dimensions beyond the original action space and enabling positive cross-task transfer. Evaluated on the LoCoMo and MemBench benchmarks, the method outperforms the strongest baselines by 25.7% and 18.9%, respectively, with evolved configurations demonstrating strong generalization capabilities.
π Abstract
Long-term memory is essential for LLM agents that operate across multiple sessions, yet existing memory systems treat retrieval infrastructure as fixed: stored content evolves while scoring functions, fusion strategies, and answer-generation policies remain frozen at deployment. We argue that truly adaptive memory requires co-evolution at two levels: the stored knowledge and the retrieval mechanism that queries it. We present EvolveMem, a self-evolving memory architecture that exposes its full retrieval configuration as a structured action space optimized by an LLM-powered diagnosis module. In each evolution round, the module reads per-question failure logs, identifies root causes, and proposes targeted configuration adjustments; a guarded meta-analyzer applies them with automatic revert-on-regression and explore-on-stagnation safeguards. This closed-loop self-evolution realizes an AutoResearch process: the system autonomously conducts iterative research cycles on its own architecture, replacing manual configuration tuning. Starting from a minimal baseline, the process converges autonomously, discovering effective retrieval strategies including entirely new configuration dimensions not present in the original action space. On LoCoMo, EvolveMem outperforms the strongest baseline by 25.7% relative and achieves a 78.0% relative improvement over the minimal baseline. On MemBench, EvolveMem exceeds the strongest baseline by 18.9% relative. Evolved configurations transfer across benchmarks with positive rather than catastrophic transfer, indicating that the self-evolution process captures universal retrieval principles rather than benchmark-specific heuristics. Code is available at https://github.com/aiming-lab/SimpleMem.