Real-World Modeling of Computation Offloading for Neural Networks with Early Exits and Splits

📅 2025-05-28

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

This work addresses the end-edge collaborative offloading problem for CNN inference in mobile robotics and autonomous driving scenarios under mobile networks, aiming to minimize end-to-end latency and on-device energy consumption. We propose a dynamic offloading mechanism integrating early exits and inter-layer splits, enabling fine-grained partial or full offloading. We design the first early-exit-enabled, layer-split CNN architecture tailored to real-world traffic sign recognition, and formulate a measurement-driven, tri-objective optimization model jointly minimizing latency, energy, and accuracy loss. Experimental results demonstrate that, compared to purely local inference, our approach significantly reduces both end-to-end processing latency and terminal energy consumption while preserving classification accuracy. Furthermore, we derive deployable lightweight models for latency and energy prediction.

Technology Category

Application Category

📝 Abstract

We focus on computation offloading of applications based on convolutional neural network (CNN) from moving devices, such as mobile robots or autonomous vehicles, to MultiAccess Edge Computing (MEC) servers via a mobile network. In order to reduce overall CNN inference time, we design and implement CNN with early exits and splits, allowing a flexible partial or full offloading of CNN inference. Through real-world experiments, we analyze an impact of the CNN inference offloading on the total CNN processing delay, energy consumption, and classification accuracy in a practical road sign recognition task. The results confirm that offloading of CNN with early exits and splits can significantly reduce both total processing delay and energy consumption compared to full local processing while not impairing classification accuracy. Based on the results of real-world experiments, we derive practical models for energy consumption and total processing delay related to offloading of CNN with early exits and splits.

Problem

Research questions and friction points this paper is trying to address.

Optimizing CNN computation offloading for mobile devices to MEC servers

Reducing inference time and energy via early exits and splits

Modeling real-world impacts on delay, energy, and accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

CNN with early exits and splits

Flexible partial or full offloading

Reduces delay and energy consumption

🔎 Similar Papers

Joint or Disjoint: Mixing Training Regimes for Early-Exit Models