Exploring the Boundaries of On-Device Inference: When Tiny Falls Short, Go Hierarchical

📅 2024-07-10
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
To address the insufficient inference capability of resource-constrained edge devices, this paper systematically compares hierarchical inference (HI) against pure on-device inference across accuracy, latency, and energy consumption. Existing HI studies overlook device-side latency and energy overhead and fail to jointly model the heterogeneous hardware, network, and model dimensions. To bridge this gap, we conduct the first multi-dimensional empirical evaluation of HI on real embedded hardware. We propose Early Exit with HI (EE-HI), a hybrid mechanism that dynamically coordinates lightweight local models and remote servers: it adaptively offloads samples and enables early exit during image classification, thereby eliminating HI’s inherent fixed overhead. Experiments show that HI reduces latency by up to 73% and device energy consumption by up to 77% compared to pure on-device inference; EE-HI further improves upon HI by reducing latency by 59.7% and device energy consumption by 60.4%.

Technology Category

Application Category

📝 Abstract
On-device inference holds great potential for increased energy efficiency, responsiveness, and privacy in edge ML systems. However, due to less capable ML models that can be embedded in resource-limited devices, use cases are limited to simple inference tasks such as visual keyword spotting, gesture recognition, and predictive analytics. In this context, the Hierarchical Inference (HI) system has emerged as a promising solution that augments the capabilities of the local ML by offloading selected samples to an edge server or cloud for remote ML inference. Existing works demonstrate through simulation that HI improves accuracy. However, they do not account for the latency and energy consumption on the device, nor do they consider three key heterogeneous dimensions that characterize ML systems: hardware, network connectivity, and models. In contrast, this paper systematically compares the performance of HI with on-device inference based on measurements of accuracy, latency, and energy for running embedded ML models on five devices with different capabilities and three image classification datasets. For a given accuracy requirement, the HI systems we designed achieved up to 73% lower latency and up to 77% lower device energy consumption than an on-device inference system. The key to building an efficient HI system is the availability of small-size, reasonably accurate on-device models whose outputs can be effectively differentiated for samples that require remote inference. Despite the performance gains, HI requires on-device inference for all samples, which adds a fixed overhead to its latency and energy consumption. Therefore, we design a hybrid system, Early Exit with HI (EE-HI), and demonstrate that compared to HI, EE-HI reduces the latency by up to 59.7% and lowers the device's energy consumption by up to 60.4%.
Problem

Research questions and friction points this paper is trying to address.

Enhancing edge ML with hierarchical inference for complex tasks
Reducing latency and energy in on-device ML systems
Optimizing hybrid systems for accuracy and efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical Inference system for edge ML
Hybrid Early Exit with HI design
Optimizes latency and energy consumption
🔎 Similar Papers
No similar papers found.
A
Adarsh Prasad Behera
IMDEA Networks Institute, Madrid, Spain
P
Paulius Daubaris
University of Helsinki, Helsinki, Finland
I
Inaki Bravo
IMDEA Networks Institute, Madrid, Spain
J
Jos'e Gallego
IMDEA Networks Institute, Madrid, Spain
Roberto Morabito
Roberto Morabito
EURECOM
Internet of ThingsEdge ComputingTinyMLEdge AINetworked AI Systems
Joerg Widmer
Joerg Widmer
Research Professor, IMDEA Networks, Madrid, Spain
Millimeter-Wave CommunicationsMillimeter-Wave NetworksWireless NetworkingComputer NetworksCommunications
J
J. Champati
IMDEA Networks Institute, Madrid, Spain