DopQ-ViT: Towards Distribution-Friendly and Outlier-Aware Post-Training Quantization for Vision Transformers

📅 2024-08-06

🏛️ arXiv.org

📈 Citations: 4

✨ Influential: 1

career value

182K/year

🤖 AI Summary

Post-training quantization of Vision Transformers (ViTs) suffers significant accuracy degradation at low bit-widths, primarily due to two factors: (1) existing quantization methods fail to model the power-law distribution of post-Softmax activations, and (2) post-LayerNorm reparameterization is severely disrupted by outliers. To address these issues, we propose a distribution-aware and outlier-robust quantization framework. First, we introduce TanQ, a novel nonlinear quantizer explicitly designed to fit power-law-distributed activations. Second, we propose Median-Optimal Scaling Factor (MOSF), which replaces mean-based scaling with median-based scaling to robustly suppress outlier influence. Third, we design a channel-to-layer reparameterization strategy that jointly optimizes post-Softmax and post-LayerNorm activations. Evaluated on ImageNet, our 4-bit quantized ViT models achieve an average +3.2% Top-1 accuracy improvement over state-of-the-art methods, demonstrating strong generalization across architectures.

Technology Category

Application Category

📝 Abstract

Vision transformers (ViTs) have garnered significant attention for their performance in vision tasks, but the high computational cost and significant latency issues have hindered widespread adoption. Post-training quantization (PTQ), a promising method for model compression, still faces accuracy degradation challenges with ViTs. There are two reasons for this: the existing quantization paradigm does not fit the power-law distribution of post-Softmax activations well, and accuracy inevitably decreases after reparameterizing post-LayerNorm activations. We propose a Distribution-Friendly and Outlier-Aware Post-training Quantization method for Vision Transformers, named DopQ-ViT. DopQ-ViT analyzes the inefficiencies of current quantizers and introduces a distribution-friendly Tan Quantizer called TanQ. TanQ focuses more on values near 1, more accurately preserving the power-law distribution of post-Softmax activations, and achieves favorable results. Besides, during the reparameterization of post-LayerNorm activations from channel-wise to layer-wise quantization, the accuracy degradation is mainly due to the significant impact of outliers in the scaling factors. Therefore, DopQ-ViT proposes a method to select Median as the Optimal Scaling Factor, denoted as MOSF, which compensates for the influence of outliers and preserves the performance of the quantization model. DopQ-ViT has been extensively validated and significantly improves the performance of quantization models, especially in low-bit settings.

Problem

Research questions and friction points this paper is trying to address.

Quantization degrades ViT performance in low-bit settings

Existing methods misalign with post-Softmax power-law distribution

Outliers in scaling factors harm post-LayerNorm reparameterization

Innovation

Methods, ideas, or system contributions that make the work stand out.

TanQ preserves power-law distribution of activations

MOSF selects optimal scaling factor efficiently

DopQ-ViT improves PTQ for Vision Transformers

🔎 Similar Papers

Towards Accurate Post-Training Quantization of Vision Transformers via Error Reduction