I&S-ViT: An Inclusive & Stable Method for Pushing the Limit of Post-Training ViTs Quantization

📅 2023-11-16

🏛️ arXiv.org

📈 Citations: 12

✨ Influential: 4

career value

183K/year

🤖 AI Summary

To address the severe accuracy degradation of Vision Transformers (ViTs) under post-training quantization (PTQ) at low bit-widths—particularly 2–4 bits—this paper proposes a novel quantization framework balancing inclusiveness and stability. Methodologically, it introduces (1) the Shift-Uniform-Log₂ Quantizer (SULQ), the first quantizer explicitly designed to model the activation distribution after Softmax, thereby mitigating quantization inaccuracy; and (2) a Three-Stage Smooth Optimization Strategy (SOS), jointly optimizing channel-wise and layer-wise quantization while incorporating LayerNorm- and Softmax-aware activation calibration. Extensive experiments demonstrate that our method achieves a 50.68% top-1 accuracy improvement over baseline ViT-B at 3-bit PTQ, substantially outperforming existing ViT PTQ approaches. Moreover, it maintains strong robustness and high performance across the 2–4 bit range, establishing new state-of-the-art results for low-bit ViT quantization.

📝 Abstract

Albeit the scalable performance of vision transformers (ViTs), the dense computational costs (training&inference) undermine their position in industrial applications. Post-training quantization (PTQ), tuning ViTs with a tiny dataset and running in a low-bit format, well addresses the cost issue but unluckily bears more performance drops in lower-bit cases. In this paper, we introduce I&S-ViT, a novel method that regulates the PTQ of ViTs in an inclusive and stable fashion. I&S-ViT first identifies two issues in the PTQ of ViTs: (1) Quantization inefficiency in the prevalent log2 quantizer for post-Softmax activations; (2) Rugged and magnified loss landscape in coarse-grained quantization granularity for post-LayerNorm activations. Then, I&S-ViT addresses these issues by introducing: (1) A novel shift-uniform-log2 quantizer (SULQ) that incorporates a shift mechanism followed by uniform quantization to achieve both an inclusive domain representation and accurate distribution approximation; (2) A three-stage smooth optimization strategy (SOS) that amalgamates the strengths of channel-wise and layer-wise quantization to enable stable learning. Comprehensive evaluations across diverse vision tasks validate I&S-ViT' superiority over existing PTQ of ViTs methods, particularly in low-bit scenarios. For instance, I&S-ViT elevates the performance of 3-bit ViT-B by an impressive 50.68%.

Problem

Research questions and friction points this paper is trying to address.

Addresses performance drops in low-bit post-training ViT quantization

Improves quantization efficiency for post-Softmax activations in ViTs

Stabilizes loss landscape in coarse-grained quantization for post-LayerNorm activations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Shift-uniform-log2 quantizer for inclusive domain representation

Three-stage smooth optimization for stable quantization learning

Combines channel-wise and layer-wise quantization granularity

🔎 Similar Papers

DopQ-ViT: Towards Distribution-Friendly and Outlier-Aware Post-Training Quantization for Vision Transformers