SigmaQuant: Hardware-Aware Heterogeneous Quantization Method for Edge DNN Inference

📅 2026-02-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Deploying deep neural networks on edge devices is constrained by limited memory, energy, and computational resources. Conventional uniform quantization struggles to balance accuracy and efficiency, while existing heterogeneous quantization approaches either rely on time-consuming search procedures or lack hardware adaptability. This work proposes an adaptive, layer-wise heterogeneous quantization framework that dynamically allocates bit-widths based on a hardware-aware mechanism, integrating layer sensitivity analysis with resource constraint modeling. By avoiding exhaustive search, the method efficiently tailors quantization strategies to diverse edge hardware constraints. Experiments demonstrate that the proposed approach consistently outperforms state-of-the-art quantization schemes across multiple hardware platforms, maintaining high model accuracy even at low bit-widths while adhering to stringent resource limitations.

Technology Category

Application Category

📝 Abstract
Deep neural networks (DNNs) are essential for performing advanced tasks on edge or mobile devices, yet their deployment is often hindered by severe resource constraints, including limited memory, energy, and computational power. While uniform quantization provides a straightforward approach to compress model and reduce hardware requirement, it fails to fully leverage the varying robustness across layers, and often lead to accuracy degradation or suboptimal resource usage, particularly at low bitwidths. In contrast, heterogeneous quantization, which allocates different bitwidths to individual layers, can mitigate these drawbacks. Nonetheless, current heterogeneous quantization methods either needs huge brute-force design space search or lacks the adaptability to meet different hardware conditions, such as memory size, energy budget, and latency requirement. Filling these gaps, this work introduces \textbf{\textit{SigmaQuant}}, an adaptive layer-wise heterogeneous quantization framework designed to efficiently balance accuracy and resource usage for varied edge environments without exhaustive search.
Problem

Research questions and friction points this paper is trying to address.

heterogeneous quantization
edge DNN inference
hardware-aware
resource constraints
bitwidth allocation
Innovation

Methods, ideas, or system contributions that make the work stand out.

heterogeneous quantization
hardware-aware
edge DNN inference
adaptive bitwidth allocation
SigmaQuant
🔎 Similar Papers
No similar papers found.