SkillGen: Verified Inference-Time Agent Skill Synthesis

📅 2026-05-09

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

This work addresses the challenge of automatically generating high-quality, controllable, and reusable agent skills without retraining large language models. It introduces SkillGen, the first multi-agent framework that models skills as verifiable interventions. By conducting contrastive analysis between successful and failed trajectories of base agents, SkillGen employs a contrastive induction mechanism to distill robust skill logic and produce auditable, human-inspectable skill scripts. The framework further evaluates the net effect of each generated skill through intervention experiments on identical task instances, simultaneously measuring performance gains and potential regressions. Experimental results demonstrate that SkillGen significantly outperforms existing methods across diverse agents and datasets, with the generated skills exhibiting strong cross-model transferability.

📝 Abstract

Skills are a promising way to improve LLM agent capabilities without retraining, while keeping the added procedure reusable and controllable. However, high-quality skills are still largely written by hand. We introduce SkillGen, a multi-agent framework that synthesizes a single auditable skill from trajectories generated by a base agent. The output is a human-readable artifact that can be inspected before use. Rather than merely summarizing trajectories, SkillGen leverages contrastive induction over both successful and failed trajectories to identify reusable success patterns, recurring failure modes, and behaviors that appear in nearby successes but are missing from failures. SkillGen then generates candidate skills and iteratively refines the skill. A key novelty in SkillGen is that we model agent skills as interventions to empirically verify the net effect of skills on the overall performance. Specifically, we compare outcomes on the same instances with and without the skill, so that we account for both repairs (cases where the skill fixes a baseline failure) and regressions (cases where the skill breaks a baseline success). Across a broad range of agents and datasets, SkillGen consistently improves held-out performance, outperforms existing skill-generation baselines, and produces skills that transfer across models.

Problem

Research questions and friction points this paper is trying to address.

skill synthesis

LLM agents

inference-time adaptation

verified interventions

trajectory analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

skill synthesis

contrastive induction

intervention-based verification