Data-Free Quantization via Mixed-Precision Compensation without Fine-Tuning

๐Ÿ“… 2023-07-02
๐Ÿ›๏ธ Pattern Recognition
๐Ÿ“ˆ Citations: 15
โœจ Influential: 0
๐Ÿ“„ PDF

career value

211K/year
๐Ÿค– AI Summary
To address the severe accuracy degradation of ultra-low-bit (โ‰ค4-bit) neural network quantization in data-free scenarios, this paper proposes the first data-free, fine-tuning-free, layer-wise mixed-precision error compensation method. Our approach fundamentally differs from prior work by eliminating reliance on original training data, synthetic data, or post-quantization adaptation. Methodologically, we (1) establish an analytical model of feature map reconstruction error and derive a closed-form solution for compensation parametersโ€”marking the first such theoretical characterization; and (2) jointly optimize bit-widths across layers and perform mixed-precision quantization to globally suppress error propagation without any data access. Experiments on ImageNet and other benchmarks demonstrate that our method significantly outperforms state-of-the-art data-free quantization approaches, achieving superior accuracy at 4-bit and lower precisions. This breakthrough overcomes the long-standing performance bottleneck in ultra-low-bit data-free quantization.
๐Ÿ“ Abstract
Neural network quantization is a very promising solution in the field of model compression, but its resulting accuracy highly depends on a training/fine-tuning process and requires the original data. This not only brings heavy computation and time costs but also is not conducive to privacy and sensitive information protection. Therefore, a few recent works are starting to focus on data-free quantization. However, data-free quantization does not perform well while dealing with ultra-low precision quantization. Although researchers utilize generative methods of synthetic data to address this problem partially, data synthesis needs to take a lot of computation and time. In this paper, we propose a data-free mixed-precision compensation (DF-MPC) method to recover the performance of an ultra-low precision quantized model without any data and fine-tuning process. By assuming the quantized error caused by a low-precision quantized layer can be restored via the reconstruction of a high-precision quantized layer, we mathematically formulate the reconstruction loss between the pre-trained full-precision model and its layer-wise mixed-precision quantized model. Based on our formulation, we theoretically deduce the closed-form solution by minimizing the reconstruction loss of the feature maps. Since DF-MPC does not require any original/synthetic data, it is a more efficient method to approximate the full-precision model. Experimentally, our DF-MPC is able to achieve higher accuracy for an ultra-low precision quantized model compared to the recent methods without any data and fine-tuning process.
Problem

Research questions and friction points this paper is trying to address.

Data-free quantization
Ultra-low precision recovery
No fine-tuning required
Innovation

Methods, ideas, or system contributions that make the work stand out.

Data-free mixed-precision compensation
No fine-tuning required
Mathematical reconstruction loss formulation