AutoRefine: From Trajectories to Reusable Expertise for Continual LLM Agent Refinement

📅 2026-01-30

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

This work addresses the challenge that large language model (LLM) agents struggle to effectively accumulate and reuse experience, as existing approaches extract only flat textual knowledge, failing to capture the procedural logic of complex subtasks and lacking robust mechanisms for knowledge base maintenance. To overcome this, we propose AutoRefine, a framework that, for the first time, enables dual extraction and co-maintenance of procedural and static experience: the former is embodied in specialized sub-agents with independent reasoning and memory capabilities, while the latter is distilled into reusable skill patterns—such as guidelines or code snippets. A continuous scoring, pruning, and merging mechanism prevents experience degradation over time. Experiments demonstrate that AutoRefine achieves success rates of 98.4%, 70.4%, and 27.1% on ALFWorld, ScienceWorld, and TravelPlanner, respectively, reducing action steps by 20–73% and substantially outperforming handcrafted systems on TravelPlanner (27.1% vs. 12.1%).

Technology Category

Application Category

📝 Abstract

Large language model agents often fail to accumulate knowledge from experience, treating each task as an independent challenge. Recent methods extract experience as flattened textual knowledge, which cannot capture procedural logic of complex subtasks. They also lack maintenance mechanisms, causing repository degradation as experience accumulates. We introduce AutoRefine, a framework that extracts and maintains dual-form Experience Patterns from agent execution histories. For procedural subtasks, we extract specialized subagents with independent reasoning and memory. For static knowledge, we extract skill patterns as guidelines or code snippets. A continuous maintenance mechanism scores, prunes, and merges patterns to prevent repository degradation. Evaluated on ALFWorld, ScienceWorld, and TravelPlanner, AutoRefine achieves 98.4%, 70.4%, and 27.1% respectively, with 20-73% step reductions. On TravelPlanner, automatic extraction exceeds manually designed systems (27.1% vs 12.1%), demonstrating its ability to capture procedural coordination.

Problem

Research questions and friction points this paper is trying to address.

knowledge accumulation

procedural logic

experience repository degradation

LLM agent

continual learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Experience Patterns

Subagents

Continual Refinement