🤖 AI Summary
This work addresses the challenges faced in industrial settings where reliance on public APIs is constrained by data security and budget limitations, while small language models (SLMs) often lack sufficient generalization capabilities for complex tasks. To bridge this gap, the authors propose and formally define the Agent Skill framework, offering the first mathematical formulation of agent skill invocation. By integrating context engineering with a skill selection mechanism, the study systematically evaluates various language models on both open-source and real-world insurance claims datasets. Experimental results demonstrate that medium-scale SLMs (12B–30B parameters) achieve significantly improved task accuracy under this framework. Notably, an 80B code-specialized model matches or even surpasses closed-source baselines in both performance and GPU efficiency, confirming the framework’s effectiveness in enhancing SLM capabilities.
📝 Abstract
Agent Skill framework, now widely and officially supported by major players such as GitHub Copilot, LangChain, and OpenAI, performs especially well with proprietary models by improving context engineering, reducing hallucinations, and boosting task accuracy. Based on these observations, an investigation is conducted to determine whether the Agent Skill paradigm provides similar benefits to small language models (SLMs). This question matters in industrial scenarios where continuous reliance on public APIs is infeasible due to data-security and budget constraints requirements, and where SLMs often show limited generalization in highly customized scenarios. This work introduces a formal mathematical definition of the Agent Skill process, followed by a systematic evaluation of language models of varying sizes across multiple use cases. The evaluation encompasses two open-source tasks and a real-world insurance claims data set. The results show that tiny models struggle with reliable skill selection, while moderately sized SLMs (approximately 12B - 30B) parameters) benefit substantially from the Agent Skill approach. Moreover, code-specialized variants at around 80B parameters achieve performance comparable to closed-source baselines while improving GPU efficiency. Collectively, these findings provide a comprehensive and nuanced characterization of the capabilities and constraints of the framework, while providing actionable insights for the effective deployment of Agent Skills in SLM-centered environments.