🤖 AI Summary
This work addresses the inefficiencies of current large language model (LLM) agents, which inject full skill contexts at runtime, introducing irrelevant information and redundant reasoning that inflate computational and token costs. To mitigate this, the authors propose a compiler-runtime co-design framework: during offline compilation, a boundary-aware mechanism decomposes skill packages into minimal executable interfaces; at runtime, only relevant components are loaded on demand, enabling fine-grained invocation. This approach further allows skill artifacts compiled by stronger models to be effectively reused by weaker ones, enhancing overall accuracy. Experimental results on SkillsBench demonstrate substantial improvements—57.44% reduction in tokens consumed during solving, 42.99% fewer reasoning rounds, and a 50.57% decrease in solving time—significantly lowering token-related expenses.
📝 Abstract
Recently, skills have been widely adopted in large language model (LLM)-based agent systems across various domains. In existing frameworks, skills are typically injected into the agent reasoning loop as contextual guidance once matched to a runtime task, enabling specialized task-solving capabilities. We find that this execution paradigm introduces two major sources of redundancy: irrelevant context injection and repeated skill-specific reasoning and planning. To this end, we propose SkillSmith, a boundary-first compiler-runtime framework that compiles skill packages offline into minimal executable interfaces. By extracting fine-grained operational boundaries from skills, SkillSmith enables agents to dynamically access and execute only the relevant components at runtime, thereby minimizing unnecessary context injection and redundant reasoning overhead. In the evaluation on SkillsBench benchmark, SkillSmith reduces solve-stage token usage by 57.44%, thinking iterations by 42.99%, solve time by 50.57% (2.02x faster), and token-proportional monetary cost by 57.44% compared with using raw-skills. Moreover, compiled artifacts produced by a stronger model can be reused by a smaller or more efficient runtime model, improving task accuracy in cases where raw skill interpretation fails. The source code and data are available at https://github.com/AetherHeart-AI/Aeloon.