๐ค AI Summary
To address data scarcity and accumulated quantization errors in low-bit quantization for edge devices, this paper proposes QuaRC, an efficient quantization-aware training (QAT) framework. Methodologically, QuaRC introduces two key innovations: (1) a high-representativeness core set constructed via relative entropy scoring, drastically reducing training data requirements; and (2) a cascaded inter-layer error correction mechanism that aligns intermediate feature outputs between full-precision and quantized models, mitigating accuracy degradation under extreme data scarcity. Under the stringent constraint of using only 1% of ImageNet-1K (i.e., ~13k samples), QuaRC achieves a 5.72% absolute Top-1 accuracy improvement over the prior state-of-the-art for 2-bit ResNet-18โsubstantially outperforming conventional QAT methods. By jointly optimizing data efficiency and quantization robustness, QuaRC establishes a new paradigm for high-accuracy, ultra-low-bit model training in resource-constrained edge environments.
๐ Abstract
With the development of mobile and edge computing, the demand for low-bit quantized models on edge devices is increasing to achieve efficient deployment. To enhance the performance, it is often necessary to retrain the quantized models using edge data. However, due to privacy concerns, certain sensitive data can only be processed on edge devices. Therefore, employing Quantization-Aware Training (QAT) on edge devices has become an effective solution. Nevertheless, traditional QAT relies on the complete dataset for training, which incurs a huge computational cost. Coreset selection techniques can mitigate this issue by training on the most representative subsets. However, existing methods struggle to eliminate quantization errors in the model when using small-scale datasets (e.g., only 10% of the data), leading to significant performance degradation. To address these issues, we propose QuaRC, a QAT framework with coresets on edge devices, which consists of two main phases: In the coreset selection phase, QuaRC introduces the ``Relative Entropy Score" to identify the subsets that most effectively capture the model's quantization errors. During the training phase, QuaRC employs the Cascaded Layer Correction strategy to align the intermediate layer outputs of the quantized model with those of the full-precision model, thereby effectively reducing the quantization errors in the intermediate layers. Experimental results demonstrate the effectiveness of our approach. For instance, when quantizing ResNet-18 to 2-bit using a 1% data subset, QuaRC achieves a 5.72% improvement in Top-1 accuracy on the ImageNet-1K dataset compared to state-of-the-art techniques.