🤖 AI Summary
Current large language model agents struggle to efficiently reuse execution experiences, often relying on fragmented reflections or unstructured memory, which leads to delayed and inefficient improvement. This work proposes the EXG framework, which introduces a structured experience graph for self-evolving agents—explicitly modeling task successes and failures as relational representations. The framework supports online, real-time graph expansion for immediate cross-task experience reuse and can also serve offline as an external memory module that integrates historical experiences. Designed to be plug-and-play and compatible with existing architectures, EXG significantly outperforms reflection- and memory-based baselines on code generation and reasoning benchmarks, achieving a better trade-off between performance and resource efficiency.
📝 Abstract
Large language model (LLM)-based agents have demonstrated strong capabilities in complex reasoning and problem solving through multi-step interactions, yet most deployed agents remain behaviorally static, with knowledge acquired during execution rarely translating into systematic improvement over time. In response, a growing line of work on self-evolving agents explores how agents can improve through experience during deployment, but most existing approaches either rely on ad hoc reflection limited to single-task correction or adopt unstructured memory that accumulates fragmented experience with delayed usability. To address this limitation, we introduce EXG, an experience graph framework for self-evolving agents that explicitly organizes accumulated successes and failures into a structured, relational representation. EXG is the first experience graph designed for self-evolving agents, supporting both online, real-time graph growth during execution for immediate cross-task experience reuse, and offline reuse of a consolidated experience graph as an external memory module. This design also enables EXG to serve as a plug-and-play component for existing self-evolving agents, organizing prior experience into a unified experience graph and improving both solution quality and resource efficiency as deployment progresses. Extensive experiments across code generation and reasoning benchmarks show that EXG attains more favorable performance-efficiency trade-offs than reflection- and memory-based baselines in both online and offline evaluations. Our results suggest that structuring experience as a graph provides a principled foundation for scalable and transferable self-evolving agent behavior.