SLM-Bench: A Comprehensive Benchmark of Small Language Models on Environmental Impacts -- Extended Version

📅 2025-08-21

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

Existing research lacks systematic, multidimensional evaluation of small language models (SLMs) in terms of both performance and environmental impact. To address this gap, we introduce SLM-Bench—the first comprehensive benchmark specifically designed for SLMs—assessing accuracy, computational efficiency, and sustainability across 15 models on nine NLP tasks under unified hardware conditions. The benchmark incorporates 11 cross-dimensional quantitative metrics, drawing on 23 datasets spanning 14 domains and conducting controlled experiments across four hardware configurations. It establishes an open-source, fully reproducible evaluation pipeline. Crucially, SLM-Bench enables the first empirical analysis of the accuracy–energy trade-off in SLMs, identifying several high-energy-efficiency models. This work provides a standardized, evidence-based evaluation framework to advance green AI research and practice. (138 words)

Technology Category

Application Category

📝 Abstract

Small Language Models (SLMs) offer computational efficiency and accessibility, yet a systematic evaluation of their performance and environmental impact remains lacking. We introduce SLM-Bench, the first benchmark specifically designed to assess SLMs across multiple dimensions, including accuracy, computational efficiency, and sustainability metrics. SLM-Bench evaluates 15 SLMs on 9 NLP tasks using 23 datasets spanning 14 domains. The evaluation is conducted on 4 hardware configurations, providing a rigorous comparison of their effectiveness. Unlike prior benchmarks, SLM-Bench quantifies 11 metrics across correctness, computation, and consumption, enabling a holistic assessment of efficiency trade-offs. Our evaluation considers controlled hardware conditions, ensuring fair comparisons across models. We develop an open-source benchmarking pipeline with standardized evaluation protocols to facilitate reproducibility and further research. Our findings highlight the diverse trade-offs among SLMs, where some models excel in accuracy while others achieve superior energy efficiency. SLM-Bench sets a new standard for SLM evaluation, bridging the gap between resource efficiency and real-world applicability.

Problem

Research questions and friction points this paper is trying to address.

Evaluating small language models' performance and environmental impact systematically

Assessing SLMs across accuracy, computational efficiency, and sustainability metrics

Quantifying trade-offs between correctness, computation, and energy consumption

Innovation

Methods, ideas, or system contributions that make the work stand out.

Comprehensive benchmark for small language models

Evaluates 15 models across 9 NLP tasks

Quantifies 11 metrics including energy efficiency

🔎 Similar Papers

No similar papers found.