🤖 AI Summary
Post-training quantization (PTQ) of image super-resolution (SR) networks often suffers severe accuracy degradation due to outliers in activations—strongly correlated with chromatic information—as well as heterogeneous layer-wise quantization sensitivity. To address this, we propose an outlier-aware dual-region quantization framework. First, we explicitly model the statistical correlation between activation outliers and color distribution. Second, activations are partitioned into outlier-prone and dense regions, each subjected to tailored uniform quantization. Finally, a lightweight, layer-sensitivity-driven fine-tuning step is applied to enhance robustness of critical layers—without full retraining. Evaluated on mainstream SR architectures (e.g., EDSR, RCAN) and benchmark datasets (Set5, Set14), our method significantly outperforms existing PTQ approaches, achieving PSNR values close to those of quantization-aware training (QAT), while accelerating inference by over 75×.
📝 Abstract
Quantization techniques, including quantization-aware training (QAT) and post-training quantization (PTQ), have become essential for inference acceleration of image super-resolution (SR) networks. Compared to QAT, PTQ has garnered significant attention as it eliminates the need for ground truth and model retraining. However, existing PTQ methods for SR often fail to achieve satisfactory performance as they overlook the impact of outliers in activation. Our empirical analysis reveals that these prevalent activation outliers are strongly correlated with image color information, and directly removing them leads to significant performance degradation. Motivated by this, we propose a dual-region quantization strategy that partitions activations into an outlier region and a dense region, applying uniform quantization to each region independently to better balance bit-width allocation. Furthermore, we observe that different network layers exhibit varying sensitivities to quantization, leading to different levels of performance degradation. To address this, we introduce sensitivity-aware finetuning that encourages the model to focus more on highly sensitive layers, further enhancing quantization performance. Extensive experiments demonstrate that our method outperforms existing PTQ approaches across various SR networks and datasets, while achieving performance comparable to QAT methods in most scenarios with at least a 75 speedup.