Learning on the Job: An Experience-Driven Self-Evolving Agent for Long-Horizon Tasks

📅 2025-10-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current LLM-based agents exhibit fundamental limitations in realistic, long-horizon tasks: they remain static at inference time and lack mechanisms for continual learning and self-evolution from experience. This work introduces MUSE—a memory-augmented, self-evolving agent framework driven by experience. MUSE integrates hierarchical memory modules, trajectory reflection mechanisms, and structured experience encoding/retrieval to enable online knowledge accumulation and self-optimization without retraining. Its core innovation is the first integration of a dynamic experience reflection mechanism into the agent architecture, supporting zero-shot cross-task transfer. Built upon the lightweight Gemini-2.5 Flash model, MUSE achieves state-of-the-art performance on the TAC benchmark. Empirical results demonstrate sustained improvement in task capability as experience accumulates, alongside strong generalization across diverse tasks—marking a significant departure from static LLM agent paradigms.

Technology Category

Application Category

📝 Abstract
Large Language Models have demonstrated remarkable capabilities across diverse domains, yet significant challenges persist when deploying them as AI agents for real-world long-horizon tasks. Existing LLM agents suffer from a critical limitation: they are test-time static and cannot learn from experience, lacking the ability to accumulate knowledge and continuously improve on the job. To address this challenge, we propose MUSE, a novel agent framework that introduces an experience-driven, self-evolving system centered around a hierarchical Memory Module. MUSE organizes diverse levels of experience and leverages them to plan and execute long-horizon tasks across multiple applications. After each sub-task execution, the agent autonomously reflects on its trajectory, converting the raw trajectory into structured experience and integrating it back into the Memory Module. This mechanism enables the agent to evolve beyond its static pretrained parameters, fostering continuous learning and self-evolution. We evaluate MUSE on the long-horizon productivity benchmark TAC. It achieves new SOTA performance by a significant margin using only a lightweight Gemini-2.5 Flash model. Sufficient Experiments demonstrate that as the agent autonomously accumulates experience, it exhibits increasingly superior task completion capabilities, as well as robust continuous learning and self-evolution capabilities. Moreover, the accumulated experience from MUSE exhibits strong generalization properties, enabling zero-shot improvement on new tasks. MUSE establishes a new paradigm for AI agents capable of real-world productivity task automation.
Problem

Research questions and friction points this paper is trying to address.

LLM agents cannot learn from experience during deployment
Static agents lack continuous improvement for long-horizon tasks
Existing frameworks cannot accumulate knowledge through task execution
Innovation

Methods, ideas, or system contributions that make the work stand out.

Experience-driven self-evolving system for long-horizon tasks
Hierarchical Memory Module organizes diverse experience levels
Autonomous trajectory reflection converts experience into structured memory
🔎 Similar Papers
No similar papers found.