Evidence Over Plans: Online Trajectory Verification for Skill Distillation

📅 2026-05-09
📈 Citations: 0
Influential: 0
📄 PDF

career value

185K/year
🤖 AI Summary
Existing skill generation methods rely on prior planning and lack empirical evidence from environmental interaction, which compromises skill quality and limits task success rates. This work proposes SPARK, a framework that performs posterior skill distillation from environment interaction trajectories and ensures alignment between generated skills and the task environment through online validation. The core innovations include a trajectory-level measure of skill credibility—termed the Posterior Distillation Index (PDI)—which drives skill diagnosis and targeted intervention. Evaluated across 86 tasks, SPARK-generated skills significantly outperform both no-skill baselines and manually authored skills, while reducing the student model’s inference cost by up to three orders of magnitude.
📝 Abstract
Agent skills can remarkably improve task success rates by using human-written procedural documents, but their quality is difficult to assess without environment-grounded verification. Existing skill generation methods heavily rely on preference logs rather than direct environment interaction, often yielding negligible or even degraded gains. We identify that it is a fundamental timing bottleneck: robust skills should be posterior-based, distilled from empirical environment interaction rather than prior plans. In this study, we introduce the Posterior Distillation Index (PDI), a trajectory-level metric that quantifies how well a distilled skill is grounded in the task-environment evidence. To operationalize PDI, we present SPARK (Structured Pipelines for Autonomous Runnable tasKs and sKill generation) for preserving task execution evidence towards full trajectory-level analysis. SPARK generates environment-verified trajectories used to compute PDI, and it applies PDI as an online diagnostic and intervention signal to ensure posterior skill formation. Across 86 runnable tasks, SPARK-generated skills consistently surpass no-skill baselines and outperform human-written skills on student models (inference cost up to 1,000x cheaper than teacher models). These findings show that PDI-guided distillation produces efficient and transferable skills grounded in the task-environment interaction. We release our code at https://github.com/EtaYang10th/spark-skills .
Problem

Research questions and friction points this paper is trying to address.

skill distillation
trajectory verification
environment-grounded skills
posterior-based learning
task execution evidence
Innovation

Methods, ideas, or system contributions that make the work stand out.

Posterior Distillation Index
Skill Distillation
Environment-Grounded Verification
Online Trajectory Verification
SPARK
🔎 Similar Papers
2024-03-20arXiv.orgCitations: 1