🤖 AI Summary
Existing research lacks systematic, multidimensional evaluation of small language models (SLMs) in terms of both performance and environmental impact. To address this gap, we introduce SLM-Bench—the first comprehensive benchmark specifically designed for SLMs—assessing accuracy, computational efficiency, and sustainability across 15 models on nine NLP tasks under unified hardware conditions. The benchmark incorporates 11 cross-dimensional quantitative metrics, drawing on 23 datasets spanning 14 domains and conducting controlled experiments across four hardware configurations. It establishes an open-source, fully reproducible evaluation pipeline. Crucially, SLM-Bench enables the first empirical analysis of the accuracy–energy trade-off in SLMs, identifying several high-energy-efficiency models. This work provides a standardized, evidence-based evaluation framework to advance green AI research and practice. (138 words)
📝 Abstract
Small Language Models (SLMs) offer computational efficiency and accessibility, yet a systematic evaluation of their performance and environmental impact remains lacking. We introduce SLM-Bench, the first benchmark specifically designed to assess SLMs across multiple dimensions, including accuracy, computational efficiency, and sustainability metrics. SLM-Bench evaluates 15 SLMs on 9 NLP tasks using 23 datasets spanning 14 domains. The evaluation is conducted on 4 hardware configurations, providing a rigorous comparison of their effectiveness. Unlike prior benchmarks, SLM-Bench quantifies 11 metrics across correctness, computation, and consumption, enabling a holistic assessment of efficiency trade-offs. Our evaluation considers controlled hardware conditions, ensuring fair comparisons across models. We develop an open-source benchmarking pipeline with standardized evaluation protocols to facilitate reproducibility and further research. Our findings highlight the diverse trade-offs among SLMs, where some models excel in accuracy while others achieve superior energy efficiency. SLM-Bench sets a new standard for SLM evaluation, bridging the gap between resource efficiency and real-world applicability.