🤖 AI Summary
Current large language model (LLM) agents treat skills as isolated, static components, which limits their reusability, reliability, and capacity for continuous evolution. This work proposes a skill-centric agent framework that unifies the entire skill lifecycle—encompassing creation, memory, evaluation, and optimization—by introducing skill-level memory mechanisms and closed-loop feedback. Within this framework, skills become long-term, testable, transferable assets capable of accumulating experience over time. Integrating LLMs with skill storage and retrieval, unit-test-based validation, and cross-task experience consolidation, the approach significantly improves task success rates, execution efficiency, skill reuse, and cross-agent transfer performance on the SkillsBench benchmark.
📝 Abstract
Large language model (LLM) agents rely on reusable skills to solve complex tasks. However, existing skill creation approaches treat skills as isolated and static artifacts, limiting their reusability, reliability, and long-term improvement. We propose MUSE-Autoskill Agent (Memory-Utilizing Skill Evolution), a skill-centric agent framework that lets agents continuously improve their task-solving capability by creating, reusing, and refining skills under a unified lifecycle (creation, memory, management, evaluation, and refinement). Our framework enables agents to create skills on demand, store and reuse them across tasks, organize and select them efficiently, and evaluate them through unit tests and runtime feedback for continuous refinement. We further introduce skill-level memory that accumulates experience for each skill across tasks, enabling more effective reuse and adaptation over time. Experiments on SkillsBench provide initial evidence that lifecycle-managed skills can improve task success, efficiency, reuse, and cross-agent transfer, highlighting the importance of treating skills as long-lived, experience-aware, and testable assets.