🤖 AI Summary
Mobile LLMs suffer from high energy consumption during decoding on resource-constrained mobile devices, further exacerbated by the inability to modify system-level configurations (e.g., no root access or kernel-level tuning). To address this, we propose Adaptive Energy-Efficient Core Selection (AECS), a user-space inference-engine–level optimization that dynamically schedules memory-intensive decoding operations onto energy-efficient CPU cores—leveraging hardware heterogeneity and runtime workload awareness under strict latency constraints. AECS is integrated into the lightweight MNN framework, yielding the MNN-AECS inference engine. To our knowledge, it is the first fully user-space, decoding-stage–specific energy optimization for mobile LLMs. Evaluations across five Android and two iOS devices with five mainstream LLMs show that MNN-AECS achieves 23% average energy reduction over baseline MNN without sacrificing inference speed; compared to llama.cpp and similar engines, it reduces energy consumption by 39%–78% while accelerating inference by 12%–363%.
📝 Abstract
As the demand for on-device Large Language Model (LLM) inference grows, energy efficiency has become a major concern, especially for battery-limited mobile devices. Our analysis shows that the memory-bound LLM decode phase dominates energy use, and yet most existing works focus on accelerating the prefill phase, neglecting energy concerns. We introduce Adaptive Energy-Centric Core Selection (AECS) and integrate it into MNN to create the energy-efficient version, MNN-AECS, the first engine-level system solution without requiring root access or OS modifications for energy-efficient LLM decoding. MNN-AECS is designed to reduce LLM decoding energy while keeping decode speed within an acceptable slowdown threshold by dynamically selecting low-power CPU cores. MNN-AECS is evaluated across 5 Android and 2 iOS devices on 5 popular LLMs of various sizes. Compared to original MNN, MNN-AECS cuts down energy use by 23% without slowdown averaged over all 7 devices and 4 datasets. Against other engines, including llama.cpp, executorch, mllm, and MediaPipe, MNN-AECS delivers 39% to 78% energy saving and 12% to 363% speedup on average.