🤖 AI Summary
The high computational cost and resource demands of large language models (LLMs) hinder their practical deployment in generative AI applications. Method: This work investigates the feasibility of small language models (SLMs) for tool-augmented agent tasks, proposing a lightweight adaptation of the 350M-parameter OPT model via single-round supervised fine-tuning (SFT) on the ToolBench benchmark—implemented using the Hugging Face TRL framework, without reinforcement learning or complex chain-of-thought reasoning. Results: The fine-tuned SLM achieves a 77.55% tool-call success rate on ToolBench, substantially outperforming strong LLM baselines including ChatGPT-CoT (26.00%) and ToolLLaMA-DFS (30.18%). It demonstrates robust performance across enterprise-grade tasks such as document summarization, question answering, and structured data parsing, enabling low-cost, high-efficiency production deployment. This study provides the first empirical evidence that domain-specialized SLMs can surpass conventional performance expectations in tool utilization, establishing a novel paradigm for efficient, scalable AI agents.
📝 Abstract
As organizations scale adoption of generative AI, model cost optimization and operational efficiency have emerged as critical factors determining sustainability and accessibility. While Large Language Models (LLMs) demonstrate impressive capabilities across diverse tasks, their extensive computational requirements make them cost-prohibitive for routine enterprise use. This limitation motivates the exploration of Small Language Models (SLMs), which can deliver comparable performance in targeted applications while drastically reducing infrastructure overhead (Irugalbandara et al., 2023). In this work, we investigate the feasibility of replacing LLM-driven workflows with optimized SLMs. We trained a domain-adapted SLM to execute representative tasks traditionally handled by LLMs, such as document summarization, query answering, and structured data interpretation. As part of the experiment, we investigated the fine-tuning of facebook/opt-350m model (single epoch only) using the Hugging Face TRL (Transformer Reinforcement Learning), specifically the Supervised Fine-Tuning (SFT) trainer. The OPT-350M model was released by Meta AI in 2022 as part of the OPT (Open Pretrained Transformer) family of models. Similar studies demonstrate that even models at the 350M parameter scale can meaningfully contribute to instruction-tuning pipelines (Mekala et al., 2024). Experimental results demonstrated that our fine-tuned SLM achieves exceptional performance with a 77.55% pass rate on ToolBench evaluation, significantly outperforming all baseline models including ChatGPT-CoT (26.00%), ToolLLaMA-DFS (30.18%), and ToolLLaMA-CoT (16.27%). These findings emphasize that thoughtful design and targeted training of SLMs can significantly lower barriers to adoption, enabling cost-effective, large-scale integration of generative AI into production systems.