Joint Optimization of Memory Frequency, Computing Frequency, Transmission Power and Task Offloading for Energy-efficient DNN Inference

📅 2025-09-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address high latency and energy consumption in DNN inference on resource-constrained devices, this paper proposes a holistic energy-efficiency co-optimization framework jointly tuning memory frequency, computation frequency, wireless transmission power, and task offloading decisions. Departing from conventional CPU/GPU-only DVFS approaches, we are the first to systematically model and empirically validate the critical impact of coordinated memory–computation frequency scaling on DNN inference energy efficiency. We formulate the problem as a mixed-integer nonlinear program (MINLP) integrating multi-dimensional resource scheduling. Our solution combines model-driven optimization with a lightweight data-driven component, unifying dynamic voltage and frequency scaling (DVFS), memory frequency scaling, and edge offloading. Experiments across local and collaborative inference scenarios demonstrate that our method reduces energy consumption by 32.7% and latency by 24.5% on average over baseline approaches, significantly improving the energy-delay trade-off for edge AI inference.

Technology Category

Application Category

📝 Abstract
Deep neural networks (DNNs) have been widely applied in diverse applications, but the problems of high latency and energy overhead are inevitable on resource-constrained devices. To address this challenge, most researchers focus on the dynamic voltage and frequency scaling (DVFS) technique to balance the latency and energy consumption by changing the computing frequency of processors. However, the adjustment of memory frequency is usually ignored and not fully utilized to achieve efficient DNN inference, which also plays a significant role in the inference time and energy consumption. In this paper, we first investigate the impact of joint memory frequency and computing frequency scaling on the inference time and energy consumption with a model-based and data-driven method. Then by combining with the fitting parameters of different DNN models, we give a preliminary analysis for the proposed model to see the effects of adjusting memory frequency and computing frequency simultaneously. Finally, simulation results in local inference and cooperative inference cases further validate the effectiveness of jointly scaling the memory frequency and computing frequency to reduce the energy consumption of devices.
Problem

Research questions and friction points this paper is trying to address.

Optimizing memory and computing frequency scaling for efficient DNN inference
Reducing high latency and energy overhead on resource-constrained devices
Joint optimization of multiple parameters including transmission power and task offloading
Innovation

Methods, ideas, or system contributions that make the work stand out.

Jointly optimizes memory and computing frequency scaling
Combines model-based and data-driven parameter fitting
Validates approach through local and cooperative inference simulations
🔎 Similar Papers
No similar papers found.
Yunchu Han
Yunchu Han
Tsinghua University
Edge IntelligenceWireless CommunicationGreen AIMobile Edge Computing
Z
Zhaojun Nan
Beijing National Research Center for Information Science and Technology, Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
S
Sheng Zhou
Beijing National Research Center for Information Science and Technology, Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
Zhisheng Niu
Zhisheng Niu
Professor of Electronic Engineering, Tsinghua University
Green CommunicationRadio Resource ManagementQueueing Theory