SkillOpt: Executive Strategy for Self-Evolving Agent Skills

📅 2026-05-22

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

Existing agent skills lack a systematic and controllable optimization mechanism, making it difficult to achieve stable improvement under feedback. This work proposes the first text-space skill optimization framework, treating skills as external states of a frozen model and introducing a separate optimizer model that performs bounded edits on skill documents based on scoring trajectories. Iterative updates are applied only when validation scores strictly improve. Key innovations include a textual learning rate budget, a rejection-edit buffer, and slow/meta-update mechanisms, enabling stable training without additional deployment overhead. Evaluated across six benchmarks, seven models, and three environments—comprising 52 total settings—the method achieves state-of-the-art or tied-best performance, yielding average accuracy gains of 19.1–24.8 points on GPT-5.5 and demonstrating strong cross-model, cross-environment, and cross-task transferability.

📝 Abstract

Agent skills today are hand-crafted, generated one-shot, or evolved through loosely controlled self-revision, none of which behaves like a deep-learning optimizer for the skill, and none of which reliably improves over its starting point under feedback. We argue the skill should instead be trained as the external state of a frozen agent, with the same discipline that makes weight-space optimization reproducible. SkillOpt is, to our knowledge, the first systematic controllable text-space optimizer for agent skills: a separate optimizer model turns scored rollouts into bounded add/delete/replace edits on a single skill document, and an edit is accepted only when it strictly improves a held-out validation score. A textual learning-rate budget, rejected-edit buffer, and epoch-wise slow/meta update make skill training stable while adding zero inference-time model calls at deployment. Across six benchmarks, seven target models, and three execution harnesses (direct chat, Codex, Claude Code), SkillOpt is best or tied on all 52 evaluated (model, benchmark, harness) cells and beats every per-cell competitor among human, one-shot LLM, Trace2Skill, TextGrad, GEPA, and EvoSkill skills. On GPT-5.5 it lifts the average no-skill accuracy by +23.5 points in direct chat, by +24.8 inside the Codex agentic loop, and by +19.1 inside Claude Code. Transfer experiments further show that optimized skill artifacts retain value when moved across model scales, between Codex and Claude Code execution environments, and to a nearby math benchmark without further optimization.

Problem

Research questions and friction points this paper is trying to address.

agent skills

skill optimization

text-space optimization

self-evolving agents

reproducible skill improvement

Innovation

Methods, ideas, or system contributions that make the work stand out.

Skill Optimization

Text-space Optimization

Self-Evolving Agents