Towards Accurate Post-Training Quantization of Vision Transformers via Error Reduction

📅 2024-07-09

📈 Citations: 3

✨ Influential: 0

career value

179K/year

🤖 AI Summary

To address the severe performance degradation in post-training quantization (PTQ) of Vision Transformers (ViTs) caused by coupled activation and weight quantization errors, this paper proposes ERQ—a two-stage error suppression framework. In the first stage, Aqer mitigates activation quantization error via reparameterized initialization and closed-form ridge regression calibration. In the second stage, Wqer suppresses weight quantization outliers through dual-uniform quantization and iterative rounding-direction proxy optimization. ERQ requires no fine-tuning, achieving an excellent balance between accuracy and efficiency. Under the W3A4 setting, ViT-S attains a 36.81% higher top-1 accuracy on ImageNet than GPTQ, substantially outperforming existing PTQ methods. Moreover, ERQ demonstrates strong generalization across diverse ViT architectures and downstream tasks.

Technology Category

Application Category

📝 Abstract

Post-training quantization (PTQ) for vision transformers (ViTs) has received increasing attention from both academic and industrial communities due to its minimal data needs and high time efficiency. However, many current methods fail to account for the complex interactions between quantized weights and activations, resulting in significant quantization errors and suboptimal performance. This paper presents ERQ, an innovative two-step PTQ method specifically crafted to reduce quantization errors arising from activation and weight quantization sequentially. The first step, Activation quantization error reduction (Aqer), first applies Reparameterization Initialization aimed at mitigating initial quantization errors in high-variance activations. Then, it further mitigates the errors by formulating a Ridge Regression problem, which updates the weights maintained at full-precision using a closed-form solution. The second step, Weight quantization error reduction (Wqer), first applies Dual Uniform Quantization to handle weights with numerous outliers, which arise from adjustments made during Reparameterization Initialization, thereby reducing initial weight quantization errors. Then, it employs an iterative approach to further tackle the errors. In each iteration, it adopts Rounding Refinement that uses an empirically derived, efficient proxy to refine the rounding directions of quantized weights, complemented by a Ridge Regression solver to reduce the errors. Comprehensive experimental results demonstrate ERQ's superior performance across various ViTs variants and tasks. For example, ERQ surpasses the state-of-the-art GPTQ by a notable 36.81% in accuracy for W3A4 ViT-S. Our codes are available at https://github.com/zysxmu/ERQ.

Problem

Research questions and friction points this paper is trying to address.

Reduces quantization errors in Vision Transformers

Improves activation and weight quantization accuracy

Enhances performance of post-training quantization methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-step PTQ method

Reparameterization Initialization

Dual Uniform Quantization

🔎 Similar Papers

DopQ-ViT: Towards Distribution-Friendly and Outlier-Aware Post-Training Quantization for Vision Transformers