Characterizing and Understanding Energy Footprint and Efficiency of Small Language Model on Edges

📅 2025-11-06

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

This study addresses the energy-efficiency bottleneck in deploying small language models (SLMs) on edge devices. We systematically evaluate inference energy consumption and performance of Llama 3.2, Phi-3 Mini, TinyLlama, and Gemma 2 across Raspberry Pi 5, Jetson Nano, and Jetson Orin Nano platforms. Using standardized, cross-hardware (CPU/GPU) and cross-model energy-efficiency benchmarks, we quantitatively characterize— for the first time—the critical influence mechanisms of GPU acceleration, memory bandwidth, and model architecture on power draw. Results show that the Jetson Orin Nano achieves optimal energy efficiency when GPU acceleration is enabled; Llama 3.2 delivers the best trade-off between accuracy and power consumption; and TinyLlama is most suitable for ultra-low-power scenarios. We propose a hardware–model co-optimization pathway and establish a reproducible energy-efficiency evaluation framework with empirically grounded design guidelines for resource-constrained edge AI deployments.

Technology Category

Application Category

📝 Abstract

Cloud-based large language models (LLMs) and their variants have significantly influenced real-world applications. Deploying smaller models (i.e., small language models (SLMs)) on edge devices offers additional advantages, such as reduced latency and independence from network connectivity. However, edge devices' limited computing resources and constrained energy budgets challenge efficient deployment. This study evaluates the power efficiency of five representative SLMs - Llama 3.2, Phi-3 Mini, TinyLlama, and Gemma 2 on Raspberry Pi 5, Jetson Nano, and Jetson Orin Nano (CPU and GPU configurations). Results show that Jetson Orin Nano with GPU acceleration achieves the highest energy-to-performance ratio, significantly outperforming CPU-based setups. Llama 3.2 provides the best balance of accuracy and power efficiency, while TinyLlama is well-suited for low-power environments at the cost of reduced accuracy. In contrast, Phi-3 Mini consumes the most energy despite its high accuracy. In addition, GPU acceleration, memory bandwidth, and model architecture are key in optimizing inference energy efficiency. Our empirical analysis offers practical insights for AI, smart systems, and mobile ad-hoc platforms to leverage tradeoffs from accuracy, inference latency, and power efficiency in energy-constrained environments.

Problem

Research questions and friction points this paper is trying to address.

Evaluating power efficiency of small language models on edge devices

Analyzing trade-offs between model accuracy and energy consumption

Identifying optimal hardware configurations for energy-efficient SLM deployment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluating SLM energy efficiency on edge devices

Comparing CPU and GPU acceleration for SLMs

Analyzing accuracy-power tradeoffs in model selection

🔎 Similar Papers

Fine-Tuning and Deploying Large Language Models Over Edges: Issues and Approaches

2024-08-20arXiv.orgCitations: 7

Liquid AI

Competitive base salary with equity in a unicorn-stage company

San Francisco / Boston

Authors to Follow