Small Language Models for Efficient Agentic Tool Calling: Outperforming Large Models with Targeted Fine-tuning

📅 2025-12-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The high computational cost and resource demands of large language models (LLMs) hinder their practical deployment in generative AI applications. Method: This work investigates the feasibility of small language models (SLMs) for tool-augmented agent tasks, proposing a lightweight adaptation of the 350M-parameter OPT model via single-round supervised fine-tuning (SFT) on the ToolBench benchmark—implemented using the Hugging Face TRL framework, without reinforcement learning or complex chain-of-thought reasoning. Results: The fine-tuned SLM achieves a 77.55% tool-call success rate on ToolBench, substantially outperforming strong LLM baselines including ChatGPT-CoT (26.00%) and ToolLLaMA-DFS (30.18%). It demonstrates robust performance across enterprise-grade tasks such as document summarization, question answering, and structured data parsing, enabling low-cost, high-efficiency production deployment. This study provides the first empirical evidence that domain-specialized SLMs can surpass conventional performance expectations in tool utilization, establishing a novel paradigm for efficient, scalable AI agents.

Technology Category

Application Category

📝 Abstract
As organizations scale adoption of generative AI, model cost optimization and operational efficiency have emerged as critical factors determining sustainability and accessibility. While Large Language Models (LLMs) demonstrate impressive capabilities across diverse tasks, their extensive computational requirements make them cost-prohibitive for routine enterprise use. This limitation motivates the exploration of Small Language Models (SLMs), which can deliver comparable performance in targeted applications while drastically reducing infrastructure overhead (Irugalbandara et al., 2023). In this work, we investigate the feasibility of replacing LLM-driven workflows with optimized SLMs. We trained a domain-adapted SLM to execute representative tasks traditionally handled by LLMs, such as document summarization, query answering, and structured data interpretation. As part of the experiment, we investigated the fine-tuning of facebook/opt-350m model (single epoch only) using the Hugging Face TRL (Transformer Reinforcement Learning), specifically the Supervised Fine-Tuning (SFT) trainer. The OPT-350M model was released by Meta AI in 2022 as part of the OPT (Open Pretrained Transformer) family of models. Similar studies demonstrate that even models at the 350M parameter scale can meaningfully contribute to instruction-tuning pipelines (Mekala et al., 2024). Experimental results demonstrated that our fine-tuned SLM achieves exceptional performance with a 77.55% pass rate on ToolBench evaluation, significantly outperforming all baseline models including ChatGPT-CoT (26.00%), ToolLLaMA-DFS (30.18%), and ToolLLaMA-CoT (16.27%). These findings emphasize that thoughtful design and targeted training of SLMs can significantly lower barriers to adoption, enabling cost-effective, large-scale integration of generative AI into production systems.
Problem

Research questions and friction points this paper is trying to address.

Optimizing model cost and operational efficiency for generative AI adoption
Replacing LLMs with fine-tuned SLMs for tool-calling tasks
Reducing computational overhead while maintaining performance in agentic workflows
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuned small language models for tool calling
Targeted training reduces computational costs significantly
Domain adaptation enables efficient enterprise AI integration
🔎 Similar Papers
No similar papers found.
P
Polaris Jhandi
Amazon Web Services
O
Owais Kazi
Amazon Web Services
Shreyas Subramanian
Shreyas Subramanian
Amazon Web Services
Generative AIArtificial IntelligenceDeep LearningReinforcement LearningFoundation Models
N
Neel Sendas
Amazon Web Services