MNN-AECS: Energy Optimization for LLM Decoding on Mobile Devices via Adaptive Core Selection

📅 2025-06-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Mobile LLMs suffer from high energy consumption during decoding on resource-constrained mobile devices, further exacerbated by the inability to modify system-level configurations (e.g., no root access or kernel-level tuning). To address this, we propose Adaptive Energy-Efficient Core Selection (AECS), a user-space inference-engine–level optimization that dynamically schedules memory-intensive decoding operations onto energy-efficient CPU cores—leveraging hardware heterogeneity and runtime workload awareness under strict latency constraints. AECS is integrated into the lightweight MNN framework, yielding the MNN-AECS inference engine. To our knowledge, it is the first fully user-space, decoding-stage–specific energy optimization for mobile LLMs. Evaluations across five Android and two iOS devices with five mainstream LLMs show that MNN-AECS achieves 23% average energy reduction over baseline MNN without sacrificing inference speed; compared to llama.cpp and similar engines, it reduces energy consumption by 39%–78% while accelerating inference by 12%–363%.

Technology Category

Application Category

📝 Abstract
As the demand for on-device Large Language Model (LLM) inference grows, energy efficiency has become a major concern, especially for battery-limited mobile devices. Our analysis shows that the memory-bound LLM decode phase dominates energy use, and yet most existing works focus on accelerating the prefill phase, neglecting energy concerns. We introduce Adaptive Energy-Centric Core Selection (AECS) and integrate it into MNN to create the energy-efficient version, MNN-AECS, the first engine-level system solution without requiring root access or OS modifications for energy-efficient LLM decoding. MNN-AECS is designed to reduce LLM decoding energy while keeping decode speed within an acceptable slowdown threshold by dynamically selecting low-power CPU cores. MNN-AECS is evaluated across 5 Android and 2 iOS devices on 5 popular LLMs of various sizes. Compared to original MNN, MNN-AECS cuts down energy use by 23% without slowdown averaged over all 7 devices and 4 datasets. Against other engines, including llama.cpp, executorch, mllm, and MediaPipe, MNN-AECS delivers 39% to 78% energy saving and 12% to 363% speedup on average.
Problem

Research questions and friction points this paper is trying to address.

Optimize energy use for LLM decoding on mobile devices
Reduce memory-bound LLM decode phase energy consumption
Enable efficient LLM inference without OS modifications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive Energy-Centric Core Selection (AECS)
Dynamic low-power CPU core selection
Engine-level system without OS modifications
🔎 Similar Papers
No similar papers found.
Zhengxiang Huang
Zhengxiang Huang
Shanghai Jiao Tong University
LLM systemML systemOn-Device Intelligence3D Computer Vision
Chaoyue Niu
Chaoyue Niu
Shanghai Jiao Tong University
Device-Cloud MLOn-Device Intelligence
Zhaode Wang
Zhaode Wang
Alibaba
J
Jiarui Xue
Shanghai Jiao Tong University
H
Hanming Zhang
Shanghai Jiao Tong University
Y
Yugang Wang
Shanghai Jiao Tong University
Z
Zewei Xin
Shanghai Jiao Tong University
X
Xiaotang Jiang
Alibaba Group
C
Chengfei Lv
Alibaba Group
F
Fan Wu
Shanghai Jiao Tong University
Guihai Chen
Guihai Chen
Professor of Computer Science
Computer Science and Technology