π€ AI Summary
Current large language model agents rely heavily on external guidance for skill acquisition and selection, lacking the capacity for autonomous internalization and evolution of skills. This work proposes SkillMaster, a novel framework that enables agents to autonomously evaluate their own skills based on execution trajectories and refine them through counterfactual utility assessment. To further decouple the intertwined optimization of task execution and skill management, we introduce the DualAdv-GRPO algorithm, which facilitates effective joint training. Evaluated on ALFWorld and WebShop, SkillMaster achieves state-of-the-art performance, surpassing existing baselines by 8.8% and 9.3% in success rate, respectively. These results demonstrate a significant improvement in the agentβs self-improvement capability and cross-task skill transferability.
π Abstract
Skills provide an effective mechanism for improving LLM agents on complex tasks, yet in existing agent frameworks, their creation, refinement, and selection are typically governed by external teachers, hand-designed rules, or auxiliary modules. As a result, skills remain external resources to be invoked, rather than capabilities that agents can develop, adapt, and internalize through experience. To endow LLM agents with autonomous skill mastery, we propose SkillMaster, a training framework that teaches agents to create new skills, refine existing skills, and select accumulated skills during task solving. This capability is achieved through three key designs. First, we train agents through trajectory-informed skill review, teaching agents to propose, update, or retain skills based on evidence from completed episodes. Second, each candidate skill edit is designed to be evaluated by its counterfactual utility on related probe tasks, providing a direct learning signal for training skill-editing decisions. Third, we introduce DualAdv-GRPO, which separately estimates advantages for task-solving actions and skill-editing decisions, stabilizing joint training across task solving and skill management. Experiments on ALFWorld and WebShop show that SkillMaster improves the overall success rate over state-of-the-art baselines by 8.8% and 9.3%, respectively, achieving the best performance among all compared methods. Further analysis reveals a marked shift in agent capability: agents trained with SkillMaster can identify skill failures, refine procedural knowledge from trajectory evidence, and transfer improvements to future tasks with limited skill-bank edits. Overall, SkillMaster moves LLM agents beyond mere skill use toward self-improving agents capable of developing, adapting, and applying their own skill repertoires.