SkillEvolver: Skill Learning as a Meta-Skill

📅 2026-05-11

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

Existing agent skills are typically static and lack the capacity for continuous evolution during real-world deployment. This work proposes SkillEvolver, a lightweight, plug-and-play online skill learning framework that models skill acquisition itself as a reusable meta-skill. By leveraging failure feedback from actual deployments, SkillEvolver iteratively refines both skill code and documentation without requiring model retraining. The framework introduces an innovative overfitting auditing mechanism to detect skill failures and silent bypasses, while maintaining compatibility with standard CLI agent interfaces. Empirical results demonstrate that SkillEvolver achieves 56.8% accuracy on 83 tasks in SkillsBench, significantly outperforming handcrafted skills (43.6%) and a no-skill baseline (29.9%). On KernelBench’s GPU kernel optimization tasks, it improves average speedup from 1.16× to 1.51×.

📝 Abstract

Agent skills today are static artifact: authored once -- by human curation or one-shot generation from parametric knowledge -- and then consumed unchanged, with no mechanism to improve from real use. We propose \textbf{SkillEvolver}, a lightweight, plug-and-play solution for online skill learning, in which a single meta-skill iteratively authors, deploys, and refines domain-specific skills. The learning target of SkillEvolver is the skill's prose and code, not model weights, so that the resulting artifact drops into any agent without retraining; and the meta-skill itself is just another skill, loaded through the same interface by any protocol-compliant CLI-agent. Unlike trace-distillation, the meta-skill refines only after deploying the learnt skill, such that the learning signal comes from failures another agent encounters while using it -- not from exploratory traces alone. Refinement iterations are governed by a fresh-agent overfit audit that catches possible leakage as well as deployed-skill-specific failures, including the silent-bypass mode in which a skill appears valid in content but is never invoked at runtime. On $83$ SkillsBench tasks spanning $15^{+}$ domains, SkillEvolver reaches $56.8\%$ accuracy versus $43.6\%$ for curated human skills and $29.9\%$ for the no-skill baseline; on three GPU kernel optimization tasks from KernelBench, it also raises mean speedup from $1.16$ to $1.51$ on average.

Problem

Research questions and friction points this paper is trying to address.

skill learning

online skill evolution

agent skills

skill refinement

meta-skill

Innovation

Methods, ideas, or system contributions that make the work stand out.

SkillEvolver

meta-skill

online skill learning