EvolveR: Self-Evolving LLM Agents through an Experience-Driven Lifecycle

📅 2025-10-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current LLM-based agents excel at tool invocation but lack the capability to iteratively refine their decision-making strategies from accumulated interaction experience. This paper introduces the first fully autonomous, experience-driven agent framework that enables closed-loop self-evolution without external supervision. Our method comprises two synergistic components: (1) offline policy distillation—integrating self-distillation with structured knowledge storage to extract reusable strategies—and (2) online interactive decision-making—unifying retrieval-augmented reasoning with policy reinforcement learning for adaptive execution. The core contribution lies in an automated mechanism for extracting and dynamically updating generalizable, reusable policies directly from behavioral trajectories—enabling agents to induce high-level problem-solving patterns from raw interactions. Empirical evaluation on multi-hop question answering benchmarks demonstrates significant improvements over strong baselines, validating the efficacy of autonomous strategy evolution. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract
Current Large Language Model (LLM) agents show strong performance in tool use, but lack the crucial capability to systematically learn from their own experiences. While existing frameworks mainly focus on mitigating external knowledge gaps, they fail to address a more fundamental limitation: the inability to iteratively refine problem-solving strategies. In this work, we introduce EvolveR, a framework designed to enable agent to self-improve through a complete, closed-loop experience lifecycle. This lifecycle comprises two key stages: (1) Offline Self-Distillation, where the agent's interaction trajectories are synthesized into a structured repository of abstract, reusable strategic principles; (2) Online Interaction, where the agent interacts with tasks and actively retrieves distilled principles to guide its decision-making, accumulating a diverse set of behavioral trajectories. This loop employs a policy reinforcement mechanism to iteratively update the agent based on its performance. We demonstrate the effectiveness of EvolveR on complex multi-hop question-answering benchmarks, where it achieves superior performance over strong agentic baselines. Our work presents a comprehensive blueprint for agents that learn not only from external data but also from the consequences of their own actions, paving the way for more autonomous and continuously improving systems. Code is available at https://github.com/Edaizi/EvolveR.
Problem

Research questions and friction points this paper is trying to address.

Enabling LLM agents to systematically learn from their own experiences
Addressing the inability to iteratively refine problem-solving strategies
Creating self-improving agents through closed-loop experience lifecycle
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-evolving agents through experience-driven lifecycle
Offline distillation synthesizes reusable strategic principles
Online interaction retrieves principles for decision-making guidance
🔎 Similar Papers
No similar papers found.