Hardware-aware Neural Architecture Search of Early Exiting Networks on Edge Accelerators

📅 2025-12-04

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

Deploying large-scale models on edge accelerators faces a fundamental trade-off among accuracy, energy efficiency, and inference latency. Method: This paper proposes a hardware-aware neural architecture search (NAS) framework that jointly optimizes both the backbone architecture and early-exit point distribution of early-exit neural networks (EENNs), while explicitly modeling quantization error and hardware resource constraints’ impact on inference performance. The method integrates quantization-aware training, fine-grained hardware performance modeling, and multi-objective optimization to simultaneously balance latency, energy consumption, and accuracy. Contribution/Results: Evaluated on CIFAR-10, the framework reduces computational cost by over 50% compared to conventional static models, significantly improving dynamic inference efficiency and energy-delay efficiency on edge devices.

Technology Category

Application Category

📝 Abstract

Advancements in high-performance computing and cloud technologies have enabled the development of increasingly sophisticated Deep Learning (DL) models. However, the growing demand for embedded intelligence at the edge imposes stringent computational and energy constraints, challenging the deployment of these large-scale models. Early Exiting Neural Networks (EENN) have emerged as a promising solution, allowing dynamic termination of inference based on input complexity to enhance efficiency. Despite their potential, EENN performance is highly influenced by the heterogeneity of edge accelerators and the constraints imposed by quantization, affecting accuracy, energy efficiency, and latency. Yet, research on the automatic optimization of EENN design for edge hardware remains limited. To bridge this gap, we propose a hardware-aware Neural Architecture Search (NAS) framework that systematically integrates the effects of quantization and hardware resource allocation to optimize the placement of early exit points within a network backbone. Experimental results on the CIFAR-10 dataset demonstrate that our NAS framework can discover architectures that achieve over a 50% reduction in computational costs compared to conventional static networks, making them more suitable for deployment in resource-constrained edge environments.

Problem

Research questions and friction points this paper is trying to address.

Optimizes early exit placement in neural networks for edge accelerators

Addresses quantization and hardware constraints in early exiting networks

Reduces computational costs for deployment in resource-constrained environments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hardware-aware NAS optimizes early exit placement

Integrates quantization and hardware resource allocation effects

Reduces computational costs over 50% for edge deployment

🔎 Similar Papers

Graph is all you need? Lightweight data-agnostic neural architecture search without training