Hardware-Aware DNN Compression for Homogeneous Edge Devices

📅 2025-01-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing model compression methods struggle to ensure consistent inference performance across identical-edge devices due to hardware-level heterogeneity—arising from manufacturing variations, environmental fluctuations, and device aging. To address this, we propose Homogeneous Device-Aware Pruning (HDAP), the first hardware-aware compression framework that explicitly models performance drift among same-model devices. HDAP comprises three key components: (i) hardware-response-based device clustering, (ii) surrogate-model-driven latency prediction, and (iii) structured pruning jointly optimized for FLOPs and latency. Evaluated on ResNet50 and MobileNetV1 under a 1.0G FLOPs constraint, HDAP achieves a 2.86× average latency reduction—outperforming state-of-the-art methods—and demonstrates strong cross-device robustness and scalability.

Technology Category

Application Category

📝 Abstract
Deploying deep neural networks (DNNs) across homogeneous edge devices (the devices with the same SKU labeled by the manufacturer) often assumes identical performance among them. However, once a device model is widely deployed, the performance of each device becomes different after a period of running. This is caused by the differences in user configurations, environmental conditions, manufacturing variances, battery degradation, etc. Existing DNN compression methods have not taken this scenario into consideration and can not guarantee good compression results in all homogeneous edge devices. To address this, we propose Homogeneous-Device Aware Pruning (HDAP), a hardware-aware DNN compression framework explicitly designed for homogeneous edge devices, aiming to achieve optimal average performance of the compressed model across all devices. To deal with the difficulty of time-consuming hardware-aware evaluations for thousands or millions of homogeneous edge devices, HDAP partitions all the devices into several device clusters, which can dramatically reduce the number of devices to evaluate and use the surrogate-based evaluation instead of hardware evaluation in real-time. Experiments on ResNet50 and MobileNetV1 with the ImageNet dataset show that HDAP consistently achieves lower average inference latency compared with state-of-the-art methods, with substantial speedup gains (e.g., 2.86 $ imes$ speedup at 1.0G FLOPs for ResNet50) on the homogeneous device clusters. HDAP offers an effective solution for scalable, high-performance DNN deployment methods for homogeneous edge devices.
Problem

Research questions and friction points this paper is trying to address.

Model Compression
DNN Speed Inconsistency
Edge Devices
Innovation

Methods, ideas, or system contributions that make the work stand out.

HDAP
Model Compression
Performance Optimization
K
Kunlong Zhang
Department of Computer Science and Engineering, Southern University of Science and Technology
Guiying Li
Guiying Li
Pengcheng Laboratory
DNN compressionCloud NativeEdge ComputingStock MarketIoVT
N
Ning Lu
Department of Computer Science and Engineering, Southern University of Science and Technology
P
Peng Yang
Department of Computer Science and Engineering, Southern University of Science and Technology
K
Ke Tang
Department of Computer Science and Engineering, Southern University of Science and Technology