Characterizing and Understanding Energy Footprint and Efficiency of Small Language Model on Edges

📅 2025-11-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the energy-efficiency bottleneck in deploying small language models (SLMs) on edge devices. We systematically evaluate inference energy consumption and performance of Llama 3.2, Phi-3 Mini, TinyLlama, and Gemma 2 across Raspberry Pi 5, Jetson Nano, and Jetson Orin Nano platforms. Using standardized, cross-hardware (CPU/GPU) and cross-model energy-efficiency benchmarks, we quantitatively characterize— for the first time—the critical influence mechanisms of GPU acceleration, memory bandwidth, and model architecture on power draw. Results show that the Jetson Orin Nano achieves optimal energy efficiency when GPU acceleration is enabled; Llama 3.2 delivers the best trade-off between accuracy and power consumption; and TinyLlama is most suitable for ultra-low-power scenarios. We propose a hardware–model co-optimization pathway and establish a reproducible energy-efficiency evaluation framework with empirically grounded design guidelines for resource-constrained edge AI deployments.

Technology Category

Application Category

📝 Abstract
Cloud-based large language models (LLMs) and their variants have significantly influenced real-world applications. Deploying smaller models (i.e., small language models (SLMs)) on edge devices offers additional advantages, such as reduced latency and independence from network connectivity. However, edge devices' limited computing resources and constrained energy budgets challenge efficient deployment. This study evaluates the power efficiency of five representative SLMs - Llama 3.2, Phi-3 Mini, TinyLlama, and Gemma 2 on Raspberry Pi 5, Jetson Nano, and Jetson Orin Nano (CPU and GPU configurations). Results show that Jetson Orin Nano with GPU acceleration achieves the highest energy-to-performance ratio, significantly outperforming CPU-based setups. Llama 3.2 provides the best balance of accuracy and power efficiency, while TinyLlama is well-suited for low-power environments at the cost of reduced accuracy. In contrast, Phi-3 Mini consumes the most energy despite its high accuracy. In addition, GPU acceleration, memory bandwidth, and model architecture are key in optimizing inference energy efficiency. Our empirical analysis offers practical insights for AI, smart systems, and mobile ad-hoc platforms to leverage tradeoffs from accuracy, inference latency, and power efficiency in energy-constrained environments.
Problem

Research questions and friction points this paper is trying to address.

Evaluating power efficiency of small language models on edge devices
Analyzing trade-offs between model accuracy and energy consumption
Identifying optimal hardware configurations for energy-efficient SLM deployment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluating SLM energy efficiency on edge devices
Comparing CPU and GPU acceleration for SLMs
Analyzing accuracy-power tradeoffs in model selection
🔎 Similar Papers
No similar papers found.
M
Md Romyull Islam
Department of Computer Science, Kennesaw State University, GA, USA
Bobin Deng
Bobin Deng
Assistant Professor of Computer Science, Kennesaw State University
Computer ArchitectureMachine LearningNeuromorphic ComputingAI for Science
N
Nobel Dhar
Department of Computer Science, Kennesaw State University, GA, USA
Tu N. Nguyen
Tu N. Nguyen
Associate Professor, Kennesaw State University | IEEE Senior Member
Network systemsQuantum ComputingQuantum Networks
S
Selena He
Department of Computer Science, Kennesaw State University, GA, USA
Yong Shi
Yong Shi
Department of Computer Science, Kennesaw State University, GA, USA
K
Kun Suo
Department of Computer Science, Kennesaw State University, GA, USA