🤖 AI Summary
This work addresses the performance degradation of large language model agents as their skill repertoire expands, a problem rooted in the absence of effective skill orchestration mechanisms. To resolve this, the authors propose GraSP, an architecture that introduces a compilation layer between skill retrieval and execution. GraSP organizes a flat set of skills into a typed directed acyclic graph enriched with precondition-effect dependency edges. By incorporating node-level validation and five locally bounded repair operators, the framework reduces replanning complexity from O(N) to O(d^h). GraSP is the first method to produce executable skill graphs, achieving consistent improvements over baselines such as ReAct across four benchmarks—including ALFWorld—with up to a 19-point increase in task reward and a 41% reduction in interaction steps, while maintaining robustness under skill overload and degraded skill quality conditions.
📝 Abstract
Skill ecosystems for LLM agents have matured rapidly, yet recent benchmarks show that providing agents with more skills does not monotonically improve performance -- focused sets of 2-3 skills outperform comprehensive documentation, and excessive skills actually hurt. The bottleneck has shifted from skill availability to skill orchestration: agents need not more skills, but a structural mechanism to select, compose, and execute them with explicit causal dependencies. We propose GraSP, the first executable skill graph architecture that introduces a compilation layer between skill retrieval and execution. GraSP transforms flat skill sets into typed directed acyclic graphs (DAGs) with precondition-effect edges, executes them with node-level verification, and performs locality-bounded repair through five typed operators -- reducing replanning from O(N) to O(d^h). Across ALFWorld, ScienceWorld, WebShop, and InterCode with eight LLM backbones, GraSP outperforms ReAct, Reflexion, ExpeL, and flat skill baselines in every configuration, improving reward by up to +19 points over the strongest baseline while cutting environment steps by up to 41%. GraSP's advantage grows with task complexity and is robust to both skill over-retrieval and quality degradation, confirming that structured orchestration -- not larger skill libraries -- is the key to reliable agent execution.