π€ AI Summary
Neural network quantization at ultra-low bit-widths suffers from significant accuracy degradation due to the non-differentiability of discrete quantization operations, causing standard quantization-aware training (QAT) β which relies on the straight-through estimator (STE) β to ignore quantization-induced discretization errors. To address this, we propose a progressive element-wise gradient estimation framework. Our method innovatively introduces a logarithmic curriculum-driven mixed-precision replacement mechanism that jointly optimizes task loss and quantization discretization error. By integrating progressive variable substitution, element-level gradient calibration, and explicit discretization error modeling, we establish an end-to-end co-optimization framework. The approach is plug-and-play and fully compatible with diverse forward quantization strategies. Evaluated on CIFAR-10 and ImageNet, our method achieves full-precision accuracyβor even surpasses itβon ResNet and VGG models quantized to 2β4 bits, consistently outperforming state-of-the-art QAT approaches.
π Abstract
Neural network quantization aims to reduce the bit-widths of weights and activations, making it a critical technique for deploying deep neural networks on resource-constrained hardware. Most Quantization-Aware Training (QAT) methods rely on the Straight-Through Estimator (STE) to address the non-differentiability of discretization functions by replacing their derivatives with that of the identity function. While effective, STE overlooks discretization errors between continuous and quantized values, which can lead to accuracy degradation -- especially at extremely low bit-widths. In this paper, we propose Progressive Element-wise Gradient Estimation (PEGE), a simple yet effective alternative to STE, which can be seamlessly integrated with any forward propagation methods and improves the quantized model accuracy. PEGE progressively replaces full-precision weights and activations with their quantized counterparts via a novel logarithmic curriculum-driven mixed-precision replacement strategy. Then it formulates QAT as a co-optimization problem that simultaneously minimizes the task loss for prediction and the discretization error for quantization, providing a unified and generalizable framework. Extensive experiments on CIFAR-10 and ImageNet across various architectures (e.g., ResNet, VGG) demonstrate that PEGE consistently outperforms existing backpropagation methods and enables low-precision models to match or even outperform the accuracy of their full-precision counterparts.