SkillsVote: Lifecycle Governance of Agent Skills from Collection, Recommendation to Evolution

📅 2026-05-18

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

This work addresses the challenges of high trajectory noise in long-horizon tasks and the redundancy, inconsistent quality, and environmental sensitivity inherent in open-ended skill ecosystems for large language model agents. To tackle these issues, the authors propose SkillsVote, a framework that establishes the first end-to-end governance system encompassing skill acquisition, recommendation, and evolution. By integrating structured skill retrieval, trajectory decomposition, multi-factor attribution, environment-adaptability analysis, verifiable task synthesis, and an evidence-gated update mechanism, SkillsVote enables high-quality skill evolution. Experimental results demonstrate that, solely through governing an external skill repository, a frozen GPT-5.2 achieves performance gains of 7.9 and 2.6 percentage points on Terminal-Bench 2.0 and SWE-Bench Pro, respectively, confirming the method’s effectiveness and generalizability.

📝 Abstract

Long-horizon LLM agents leave traces that could become reusable experience, but raw trajectories are noisy and hard to govern. We treat Agent Skills as an experience schema that couples executable scripts, with non-executable guidance on procedures. Yet open skill ecosystems contain redundant, uneven, environment-sensitive artifacts, and indiscriminate updates can pollute future context. We present SkillsVote, a lifecycle-governance framework for Agent Skills from collection and recommendation to evolution. SkillsVote profiles a million-scale open-source corpus for environment requirements, quality, and verifiability, then synthesizes tasks for verifiable skills. Before execution, SkillsVote performs agentic library search over structured skill library to expose instructional skill context. After execution, it decomposes trajectories into skill-linked subtasks, attributes outcomes to skill use, agent exploration, environment, and result signals, and admits only successful reusable discoveries to evidence-gated updates. In our evaluation, offline evolution improves GPT-5.2 on Terminal-Bench 2.0 by up to 7.9 pp, while online evolution improves SWE-Bench Pro by up to 2.6 pp. Overall, governed external skill libraries can improve frozen agents without model updates when systems control exposure, credit, and preservation.

Problem

Research questions and friction points this paper is trying to address.

Agent Skills

Lifecycle Governance

Skill Evolution

Experience Reuse

Trajectory Decomposition

Innovation

Methods, ideas, or system contributions that make the work stand out.

Agent Skills

Lifecycle Governance

SkillsVote