SLM-Bench: A Comprehensive Benchmark of Small Language Models on Environmental Impacts -- Extended Version

📅 2025-08-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing research lacks systematic, multidimensional evaluation of small language models (SLMs) in terms of both performance and environmental impact. To address this gap, we introduce SLM-Bench—the first comprehensive benchmark specifically designed for SLMs—assessing accuracy, computational efficiency, and sustainability across 15 models on nine NLP tasks under unified hardware conditions. The benchmark incorporates 11 cross-dimensional quantitative metrics, drawing on 23 datasets spanning 14 domains and conducting controlled experiments across four hardware configurations. It establishes an open-source, fully reproducible evaluation pipeline. Crucially, SLM-Bench enables the first empirical analysis of the accuracy–energy trade-off in SLMs, identifying several high-energy-efficiency models. This work provides a standardized, evidence-based evaluation framework to advance green AI research and practice. (138 words)

Technology Category

Application Category

📝 Abstract
Small Language Models (SLMs) offer computational efficiency and accessibility, yet a systematic evaluation of their performance and environmental impact remains lacking. We introduce SLM-Bench, the first benchmark specifically designed to assess SLMs across multiple dimensions, including accuracy, computational efficiency, and sustainability metrics. SLM-Bench evaluates 15 SLMs on 9 NLP tasks using 23 datasets spanning 14 domains. The evaluation is conducted on 4 hardware configurations, providing a rigorous comparison of their effectiveness. Unlike prior benchmarks, SLM-Bench quantifies 11 metrics across correctness, computation, and consumption, enabling a holistic assessment of efficiency trade-offs. Our evaluation considers controlled hardware conditions, ensuring fair comparisons across models. We develop an open-source benchmarking pipeline with standardized evaluation protocols to facilitate reproducibility and further research. Our findings highlight the diverse trade-offs among SLMs, where some models excel in accuracy while others achieve superior energy efficiency. SLM-Bench sets a new standard for SLM evaluation, bridging the gap between resource efficiency and real-world applicability.
Problem

Research questions and friction points this paper is trying to address.

Evaluating small language models' performance and environmental impact systematically
Assessing SLMs across accuracy, computational efficiency, and sustainability metrics
Quantifying trade-offs between correctness, computation, and energy consumption
Innovation

Methods, ideas, or system contributions that make the work stand out.

Comprehensive benchmark for small language models
Evaluates 15 models across 9 NLP tasks
Quantifies 11 metrics including energy efficiency
🔎 Similar Papers
No similar papers found.
N
Nghiem Thanh Pham
FPT University, Vietnam
Tung Kieu
Tung Kieu
Aalborg University, Department of Computer Science
Data MiningData ManagementSpatio-Temporal DataTime Series Analysis
D
Duc-Manh Nguyen
Technische Universität Berlin, Germany
S
Son Ha Xuan
RMIT University, Vietnam
Nghia Duong-Trung
Nghia Duong-Trung
German Research Center for Artificial Intelligence (DFKI), Germany
D
Danh Le-Phuoc
Technische Universität Berlin, Germany