🤖 AI Summary
This work addresses the limitations of existing large language model agents that treat memory retrieval as a static process, resulting in poor generalization and susceptibility to instance-level noise. To overcome these issues, the authors propose a unified framework for memory retrieval and management that jointly optimizes memory access and updating through semantic neighborhood modeling and a neighborhood-level marginal utility reward mechanism. This approach enhances the generalization of retrieved memories across semantically related query clusters. Integrating the GRPO optimization algorithm with a co-training strategy, the method significantly outperforms strong baselines across five benchmarks, achieving up to a 10.67% performance gain in multi-turn interactive tasks and demonstrating monotonic performance improvement during continual evolution.
📝 Abstract
Self-evolving memory serves as the trainable parameters for Large Language Models (LLMs)-based agents, where extraction (distilling insights from experience) and management (updating the memory bank) must be tightly coordinated. Existing methods predominately optimize memory management while treating memory extraction as a static process, resulting in poor generalization, where agents accumulate instance-specific noise rather than robust memories. To address this, we propose Unified Memory Extraction and Management (UMEM), a self-evolving agent framework that jointly optimizes a Large Language Model to simultaneous extract and manage memories. To mitigate overfitting to specific instances, we introduce Semantic Neighborhood Modeling and optimize the model with a neighborhood-level marginal utility reward via GRPO. This approach ensures memory generalizability by evaluating memory utility across clusters of semantically related queries. Extensive experiments across five benchmarks demonstrate that UMEM significantly outperforms highly competitive baselines, achieving up to a 10.67% improvement in multi-turn interactive tasks. Futhermore, UMEM maintains a monotonic growth curve during continuous evolution. Codes and models will be publicly released.