Organizing, Orchestrating, and Benchmarking Agent Skills at Ecosystem Scale

📅 2026-03-02

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

This work addresses the challenge of efficiently managing and scaling large-scale agent skill ecosystems by proposing AgentSkillOS, a novel framework that establishes the first systematic architecture for ecosystem-level skill management. The framework organizes skills into a capability tree and enables multi-skill collaborative orchestration through directed acyclic graphs (DAGs). Key technical components include recursive node classification, capability tree construction, DAG-based task pipelines, LLM-paired evaluation, and Bradley-Terry model–based score aggregation. Experimental results demonstrate that, across skill scales ranging from 200 to 200,000, tree-based retrieval approaches near-optimal skill selection, DAG-based orchestration significantly outperforms flat invocation strategies, and output quality—validated under a unified scoring system—shows substantial improvement.

Technology Category

Application Category

📝 Abstract

The rapid proliferation of Claude agent skills has raised the central question of how to effectively leverage, manage, and scale the agent skill ecosystem. In this paper, we propose AgentSkillOS, the first principled framework for skill selection, orchestration, and ecosystem-level management. AgentSkillOS comprises two stages: (i) Manage Skills, which organizes skills into a capability tree via node-level recursive categorization for efficient discovery; and (ii) Solve Tasks, which retrieves, orchestrates, and executes multiple skills through DAG-based pipelines. To evaluate the agent's ability to invoke skills, we construct a benchmark of 30 artifact-rich tasks across five categories: data computation, document creation, motion video, visual design, and web interaction. We assess the quality of task outputs using LLM-based pairwise evaluation, and the results are aggregated via a Bradley-Terry model to produce unified quality scores. Experiments across three skill ecosystem scales (200 to 200K skills) show that tree-based retrieval effectively approximates oracle skill selection, and that DAG-based orchestration substantially outperforms native flat invocation even when given the identical skill set. Our findings confirm that structured composition is the key to unlocking skill potential. Our GitHub repository is available at:https://github.com/ynulihao/AgentSkillOS.

Problem

Research questions and friction points this paper is trying to address.

agent skills

skill ecosystem

skill orchestration

capability organization

task automation

Innovation

Methods, ideas, or system contributions that make the work stand out.

AgentSkillOS

capability tree

DAG-based orchestration