🤖 AI Summary
This work addresses the end-edge collaborative offloading problem for CNN inference in mobile robotics and autonomous driving scenarios under mobile networks, aiming to minimize end-to-end latency and on-device energy consumption. We propose a dynamic offloading mechanism integrating early exits and inter-layer splits, enabling fine-grained partial or full offloading. We design the first early-exit-enabled, layer-split CNN architecture tailored to real-world traffic sign recognition, and formulate a measurement-driven, tri-objective optimization model jointly minimizing latency, energy, and accuracy loss. Experimental results demonstrate that, compared to purely local inference, our approach significantly reduces both end-to-end processing latency and terminal energy consumption while preserving classification accuracy. Furthermore, we derive deployable lightweight models for latency and energy prediction.
📝 Abstract
We focus on computation offloading of applications based on convolutional neural network (CNN) from moving devices, such as mobile robots or autonomous vehicles, to MultiAccess Edge Computing (MEC) servers via a mobile network. In order to reduce overall CNN inference time, we design and implement CNN with early exits and splits, allowing a flexible partial or full offloading of CNN inference. Through real-world experiments, we analyze an impact of the CNN inference offloading on the total CNN processing delay, energy consumption, and classification accuracy in a practical road sign recognition task. The results confirm that offloading of CNN with early exits and splits can significantly reduce both total processing delay and energy consumption compared to full local processing while not impairing classification accuracy. Based on the results of real-world experiments, we derive practical models for energy consumption and total processing delay related to offloading of CNN with early exits and splits.