🤖 AI Summary
This work addresses the challenge that large language models (LLMs) struggle to continually learn without weight updates, as existing memory-based approaches are often vulnerable to noise and lack mechanisms for active optimization. To overcome this, the authors propose MemRL, a framework that decouples the stable reasoning capabilities of a frozen LLM from a plastic episodic memory module. MemRL enables runtime self-evolution through non-parametric reinforcement learning, featuring a two-stage retrieval mechanism—semantic filtering followed by Q-value-based selection—and leverages environmental feedback to update Q-values online. Experiments on HLE, BigCodeBench, ALFWorld, and Lifelong Agent Bench demonstrate that MemRL significantly outperforms current methods, establishing its effectiveness in achieving efficient continual learning without fine-tuning.
📝 Abstract
The hallmark of human intelligence is the self-evolving ability to master new skills by learning from past experiences. However, current AI agents struggle to emulate this self-evolution: fine-tuning is computationally expensive and prone to catastrophic forgetting, while existing memory-based methods rely on passive semantic matching that often retrieves noise. To address these challenges, we propose MemRL, a non-parametric approach that evolves via reinforcement learning on episodic memory. By decoupling stable reasoning from plastic memory, MemRL employs a Two-Phase Retrieval mechanism to filter noise and identify high-utility strategies through environmental feedback. Extensive experiments on HLE, BigCodeBench, ALFWorld, and Lifelong Agent Bench demonstrate that MemRL significantly outperforms state-of-the-art baselines, confirming that MemRL effectively reconciles the stability-plasticity dilemma, enabling continuous runtime improvement without weight updates. Code is available at https://github.com/MemTensor/MemRL.