SkillGen: Verified Inference-Time Agent Skill Synthesis

📅 2026-05-09
📈 Citations: 0
Influential: 0
📄 PDF

career value

186K/year
🤖 AI Summary
This work addresses the challenge of automatically generating high-quality, controllable, and reusable agent skills without retraining large language models. It introduces SkillGen, the first multi-agent framework that models skills as verifiable interventions. By conducting contrastive analysis between successful and failed trajectories of base agents, SkillGen employs a contrastive induction mechanism to distill robust skill logic and produce auditable, human-inspectable skill scripts. The framework further evaluates the net effect of each generated skill through intervention experiments on identical task instances, simultaneously measuring performance gains and potential regressions. Experimental results demonstrate that SkillGen significantly outperforms existing methods across diverse agents and datasets, with the generated skills exhibiting strong cross-model transferability.
📝 Abstract
Skills are a promising way to improve LLM agent capabilities without retraining, while keeping the added procedure reusable and controllable. However, high-quality skills are still largely written by hand. We introduce SkillGen, a multi-agent framework that synthesizes a single auditable skill from trajectories generated by a base agent. The output is a human-readable artifact that can be inspected before use. Rather than merely summarizing trajectories, SkillGen leverages contrastive induction over both successful and failed trajectories to identify reusable success patterns, recurring failure modes, and behaviors that appear in nearby successes but are missing from failures. SkillGen then generates candidate skills and iteratively refines the skill. A key novelty in SkillGen is that we model agent skills as interventions to empirically verify the net effect of skills on the overall performance. Specifically, we compare outcomes on the same instances with and without the skill, so that we account for both repairs (cases where the skill fixes a baseline failure) and regressions (cases where the skill breaks a baseline success). Across a broad range of agents and datasets, SkillGen consistently improves held-out performance, outperforms existing skill-generation baselines, and produces skills that transfer across models.
Problem

Research questions and friction points this paper is trying to address.

skill synthesis
LLM agents
inference-time adaptation
verified interventions
trajectory analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

skill synthesis
contrastive induction
intervention-based verification
LLM agents
trajectory analysis
🔎 Similar Papers
No similar papers found.