Efficient Multi-bit Quantization Network Training via Weight Bias Correction and Bit-wise Coreset Sampling

📅 2025-10-23

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

Existing multi-bit quantized networks require full retraining for each bit-width, leading to linear growth in computational cost with precision levels and necessitating additional fine-tuning for newly introduced precisions. To address this, we propose a single-model, multi-precision training framework. First, we introduce a weight-bias correction technique to align activation distributions across bit-widths and mitigate quantization-induced distribution shifts. Second, we design a gradient-driven, per-bit core-set sampling strategy to enable cross-precision knowledge transfer within a shared backbone. Third, we integrate shared batch normalization and joint training of multiple submodels to support both ResNet and Vision Transformer (ViT) architectures. Our method achieves state-of-the-art accuracy on CIFAR-10/100, TinyImageNet, and ImageNet-1K, while improving training efficiency by up to 7.88× compared to conventional per-precision training—significantly reducing the overhead of deploying multi-precision quantized models.

Technology Category

Application Category

📝 Abstract

Multi-bit quantization networks enable flexible deployment of deep neural networks by supporting multiple precision levels within a single model. However, existing approaches suffer from significant training overhead as full-dataset updates are repeated for each supported bit-width, resulting in a cost that scales linearly with the number of precisions. Additionally, extra fine-tuning stages are often required to support additional or intermediate precision options, further compounding the overall training burden. To address this issue, we propose two techniques that greatly reduce the training overhead without compromising model utility: (i) Weight bias correction enables shared batch normalization and eliminates the need for fine-tuning by neutralizing quantization-induced bias across bit-widths and aligning activation distributions; and (ii) Bit-wise coreset sampling strategy allows each child model to train on a compact, informative subset selected via gradient-based importance scores by exploiting the implicit knowledge transfer phenomenon. Experiments on CIFAR-10/100, TinyImageNet, and ImageNet-1K with both ResNet and ViT architectures demonstrate that our method achieves competitive or superior accuracy while reducing training time up to 7.88x. Our code is released at https://github.com/a2jinhee/EMQNet_jk.

Problem

Research questions and friction points this paper is trying to address.

Reducing training overhead in multi-bit quantization networks

Eliminating fine-tuning needs for multiple precision levels

Addressing linear cost scaling with number of precisions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Weight bias correction aligns activation distributions across bit-widths

Bit-wise coreset sampling selects compact subsets via gradient importance

Method reduces training time up to 7.88x while maintaining accuracy

🔎 Similar Papers

No similar papers found.