Graph of Skills: Dependency-Aware Structural Retrieval for Massive Agent Skills

📅 2026-04-06

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This work addresses the challenge of context window saturation caused by loading large-scale agent skill libraries, which leads to high token costs, hallucination, and latency. The authors propose a structured retrieval mechanism at inference time that dynamically fetches dependency-aware, compact skill sets. This approach leverages an offline-constructed executable skill graph and combines semantic-lexical hybrid seeds, inverse-weight personalized PageRank, and a context-budget hydration strategy during retrieval. By introducing dependency-aware graph structures into skill retrieval for the first time, the method significantly enhances both efficiency and accuracy. Experiments demonstrate an average reward improvement of 43.6% and a 37.8% reduction in input tokens on SkillsBench and ALFWorld, with consistent generalization across diverse models including Claude Sonnet, GPT-5.2 Codex, and MiniMax.

Technology Category

Application Category

📝 Abstract

Skill usage has become a core component of modern agent systems and can substantially improve agents' ability to complete complex tasks. In real-world settings, where agents must monitor and interact with numerous personal applications, web browsers, and other environment interfaces, skill libraries can scale to thousands of reusable skills. Scaling to larger skill sets introduces two key challenges. First, loading the full skill set saturates the context window, driving up token costs, hallucination, and latency. In this paper, we present Graph of Skills (GoS), an inference-time structural retrieval layer for large skill libraries. GoS constructs an executable skill graph offline from skill packages, then at inference time retrieves a bounded, dependency-aware skill bundle through hybrid semantic-lexical seeding, reverse-weighted Personalized PageRank, and context-budgeted hydration. On SkillsBench and ALFWorld, GoS improves average reward by 43.6% over the vanilla full skill-loading baseline while reducing input tokens by 37.8%, and generalizes across three model families: Claude Sonnet, GPT-5.2 Codex, and MiniMax. Additional ablation studies across skill libraries ranging from 200 to 2,000 skills further demonstrate that GoS consistently outperforms both vanilla skills loading and simple vector retrieval in balancing reward, token efficiency, and runtime.

Problem

Research questions and friction points this paper is trying to address.

skill retrieval

large skill libraries

context window limitation

token efficiency

agent systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph of Skills

structural retrieval

dependency-aware