🤖 AI Summary
To address resource constraints and hard real-time requirements in multi-mobile-device DNN inference offloading to GPU-enabled edge servers, this paper proposes a joint optimization framework that simultaneously determines task offloading decisions, DNN-layer-granularity subtask partitioning, GPU dynamic batching, and DVFS-based frequency scaling. We formulate the problem as a mixed-integer nonlinear program (MINLP) — the first such model for this setting — and design the low-complexity J-DOB algorithm, which provides theoretical guarantees on near-optimal performance. Compared to purely local execution, J-DOB reduces total system energy consumption by 51.30% and 45.27% under identical and heterogeneous deadline settings, respectively, while significantly improving both energy efficiency and end-to-end latency compliance. The core contribution lies in the four-dimensional coupled modeling and efficient optimization of layer partitioning, batch scheduling, DVFS, and offloading decisions.
📝 Abstract
With the growing integration of artificial intelligence in mobile applications, a substantial number of deep neural network (DNN) inference requests are generated daily by mobile devices. Serving these requests presents significant challenges due to limited device resources and strict latency requirements. Therefore, edge-device co-inference has emerged as an effective paradigm to address these issues. In this study, we focus on a scenario where multiple mobile devices offload inference tasks to an edge server equipped with a graphics processing unit (GPU). For finer control over offloading and scheduling, inference tasks are partitioned into smaller sub-tasks. Additionally, GPU batch processing is employed to boost throughput and improve energy efficiency. This work investigates the problem of minimizing total energy consumption while meeting hard latency constraints. We propose a low-complexity Joint DVFS, Offloading, and Batching strategy (J-DOB) to solve this problem. The effectiveness of the proposed algorithm is validated through extensive experiments across varying user numbers and deadline constraints. Results show that J-DOB can reduce energy consumption by up to 51.30% and 45.27% under identical and different deadlines, respectively, compared to local computing.