Joint Optimization of Offloading, Batching and DVFS for Multiuser Co-Inference

📅 2025-04-20

📈 Citations: 0

✨ Influential: 0

career value

232K/year

🤖 AI Summary

To address resource constraints and hard real-time requirements in multi-mobile-device DNN inference offloading to GPU-enabled edge servers, this paper proposes a joint optimization framework that simultaneously determines task offloading decisions, DNN-layer-granularity subtask partitioning, GPU dynamic batching, and DVFS-based frequency scaling. We formulate the problem as a mixed-integer nonlinear program (MINLP) — the first such model for this setting — and design the low-complexity J-DOB algorithm, which provides theoretical guarantees on near-optimal performance. Compared to purely local execution, J-DOB reduces total system energy consumption by 51.30% and 45.27% under identical and heterogeneous deadline settings, respectively, while significantly improving both energy efficiency and end-to-end latency compliance. The core contribution lies in the four-dimensional coupled modeling and efficient optimization of layer partitioning, batch scheduling, DVFS, and offloading decisions.

Technology Category

Application Category

📝 Abstract

With the growing integration of artificial intelligence in mobile applications, a substantial number of deep neural network (DNN) inference requests are generated daily by mobile devices. Serving these requests presents significant challenges due to limited device resources and strict latency requirements. Therefore, edge-device co-inference has emerged as an effective paradigm to address these issues. In this study, we focus on a scenario where multiple mobile devices offload inference tasks to an edge server equipped with a graphics processing unit (GPU). For finer control over offloading and scheduling, inference tasks are partitioned into smaller sub-tasks. Additionally, GPU batch processing is employed to boost throughput and improve energy efficiency. This work investigates the problem of minimizing total energy consumption while meeting hard latency constraints. We propose a low-complexity Joint DVFS, Offloading, and Batching strategy (J-DOB) to solve this problem. The effectiveness of the proposed algorithm is validated through extensive experiments across varying user numbers and deadline constraints. Results show that J-DOB can reduce energy consumption by up to 51.30% and 45.27% under identical and different deadlines, respectively, compared to local computing.

Problem

Research questions and friction points this paper is trying to address.

Optimize energy in edge-device co-inference for mobile DNN tasks

Balance latency constraints with GPU batching and offloading

Minimize total energy consumption under strict deadline requirements

Innovation

Methods, ideas, or system contributions that make the work stand out.

Joint DVFS, Offloading, and Batching strategy

Edge-device co-inference with GPU batch processing

Partitioning tasks for finer offloading control

🔎 Similar Papers

Efficient Heterogeneous Large Language Model Decoding with Model-Attention Disaggregation