🤖 AI Summary
This work addresses the fragmentation of skill ecosystems for large language model (LLM) agents and their limited capacity for autonomous evolution by proposing the first training-free framework for skill self-optimization. The framework integrates four key components: diverse task generation, lightweight prompt and code refinement, comparative execution, and traceable multidimensional evaluation, operating seamlessly in both simulated and real-world modes. Leveraging a training-agnostic GRPO approach, the system enables automatic skill evolution and performance tracking. Evaluated on the Skill-X benchmark comprising 48 distinct skills, the method demonstrates substantial improvements in overall skill performance, validating its effectiveness and generalization across diverse task categories.
📝 Abstract
We introduce Skills-Coach, a novel automated framework designed to significantly enhance the self-evolution of skills within Large Language Model (LLM)-based agents. Addressing the current fragmentation of the skill ecosystem, Skills-Coach explores the boundaries of skill capabilities, thereby facilitating the comprehensive competency coverage essential for intelligent applications. The framework comprises four core modules: a Diverse Task Generation Module that systematically creates a comprehensive test suite for various skills; a Lightweight Optimization Module dedicated to optimizing skill prompts and their corresponding code; a Comparative Execution Module facilitating the execution and evaluation of both original and optimized skills; and a Traceable Evaluation Module, which rigorously evaluates performance against specified criteria. Skills-Coach offers flexible execution options through its virtual and real modes. To validate its efficacy, we introduce Skill-X, a comprehensive benchmark dataset consisting of 48 diverse skills. Experimental results demonstrate that Skills-Coach achieves significant performance improvements in skill capability across a wide range of categories, highlighting its potential to advance the development of more robust and adaptable LLM-based agents.