🤖 AI Summary
This work addresses the poor cross-platform portability and low execution efficiency of skills in large language model (LLM) agents, which stem from the lack of standardized execution mechanisms. To overcome these limitations, the study introduces compiler design principles into skill execution for the first time, proposing a capability-profile-based compilation strategy and an adaptive recompilation mechanism. By leveraging techniques such as capability decomposition modeling, compile-time environment binding, concurrent extraction, and just-in-time (JIT) code materialization, the approach enables efficient and portable skill execution. Experiments across eight LLMs and three agent frameworks demonstrate significant improvements: task completion rates increase markedly, token consumption is reduced by up to 40%, parallel execution achieves a 3.2× speedup, and latency is lowered by 19–50×.
📝 Abstract
LLM agents increasingly adopt skills as a reusable unit of composition. While skills are shared across diverse agent platforms, current systems treat them as raw context, causing the same skill to behave inconsistently for different agents. This fragility undermines skill portability and execution efficiency.
To address this challenge, we analyze 118,000 skills and draw inspiration from traditional compiler design. We treat skills as code and LLMs as heterogeneous processors. To make portability actionable, we decompose a skill's requirements into a set of primitive capabilities, and measure how well each model-harness pair supports them. Based on these capability profiles, we propose SkillRT, a compilation and runtime system designed for portable and efficient skill execution. At compile time, SkillRT performs capability-based compilation, environment binding, and concurrency extraction. At runtime, SkillRT applies JIT code solidification and adaptive recompilation for performance optimization.
We evaluate SkillRT across eight LLMs of varying scales and three agent harnesses, covering SkillsBench and representative skill tasks. Results demonstrate that SkillRT significantly improves task completion rates across different models and environments while reducing token consumption by up to 40%. In terms of performance, SkillRT achieves up to 3.2x speedup with enhanced parallelism, and 19-50x latency reduction through code solidification.