Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning

📅 2026-05-07

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

Existing approaches to skill-based reinforcement learning often optimize skill selection, execution, and distillation in isolation or rely on heterogeneous reward signals, leading to inconsistent evolution of the skill library. This work proposes the first unified reinforcement learning framework that jointly trains a single policy to perform skill retrieval, re-ranking, execution, and distillation of new skills, all guided solely by task outcome signals. The framework introduces distinct high- and low-frequency reward signals to separately shape skill selection and distillation learning. Evaluated on the ALFWorld and WebShop benchmarks, the method significantly outperforms current skill-augmented and reinforcement learning baselines. Ablation studies confirm the necessity of each credit assignment signal for enabling coherent co-evolution of the agent’s skill repertoire and policy.

📝 Abstract

A persistent skill library allows language model agents to reuse successful strategies across tasks. Maintaining such a library requires three coupled capabilities. The agent selects a relevant skill, utilizes it during execution, and distills new skills from experience. Existing methods optimize these capabilities in isolation or with separate reward sources, resulting in partial and conflicting evolution. We propose Skill1, a framework that trains a single policy to co-evolve skill selection, utilization, and distillation toward a shared task-outcome objective. The policy generates a query to search the skill library, re-ranks candidates to select one, solves the task conditioned on it, and distills a new skill from the trajectory. All learning derives from a single task-outcome signal. Its low-frequency trend credits selection and its high-frequency variation credits distillation. Experiments on ALFWorld and WebShop show that Skill1 outperforms prior skill-based and reinforcement learning baselines. Training dynamics confirm the co-evolution of the three capabilities, and ablations show that removing any credit signal degrades the evolution.

Problem

Research questions and friction points this paper is trying to address.

skill library

reinforcement learning

skill selection

skill distillation

co-evolution

Innovation

Methods, ideas, or system contributions that make the work stand out.

skill library

co-evolution

reinforcement learning