Towards Accurate Post-Training Quantization of Vision Transformers via Error Reduction

πŸ“… 2024-07-09
πŸ“ˆ Citations: 3
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the severe performance degradation in post-training quantization (PTQ) of Vision Transformers (ViTs) caused by coupled activation and weight quantization errors, this paper proposes ERQβ€”a two-stage error suppression framework. In the first stage, Aqer mitigates activation quantization error via reparameterized initialization and closed-form ridge regression calibration. In the second stage, Wqer suppresses weight quantization outliers through dual-uniform quantization and iterative rounding-direction proxy optimization. ERQ requires no fine-tuning, achieving an excellent balance between accuracy and efficiency. Under the W3A4 setting, ViT-S attains a 36.81% higher top-1 accuracy on ImageNet than GPTQ, substantially outperforming existing PTQ methods. Moreover, ERQ demonstrates strong generalization across diverse ViT architectures and downstream tasks.

Technology Category

Application Category

πŸ“ Abstract
Post-training quantization (PTQ) for vision transformers (ViTs) has received increasing attention from both academic and industrial communities due to its minimal data needs and high time efficiency. However, many current methods fail to account for the complex interactions between quantized weights and activations, resulting in significant quantization errors and suboptimal performance. This paper presents ERQ, an innovative two-step PTQ method specifically crafted to reduce quantization errors arising from activation and weight quantization sequentially. The first step, Activation quantization error reduction (Aqer), first applies Reparameterization Initialization aimed at mitigating initial quantization errors in high-variance activations. Then, it further mitigates the errors by formulating a Ridge Regression problem, which updates the weights maintained at full-precision using a closed-form solution. The second step, Weight quantization error reduction (Wqer), first applies Dual Uniform Quantization to handle weights with numerous outliers, which arise from adjustments made during Reparameterization Initialization, thereby reducing initial weight quantization errors. Then, it employs an iterative approach to further tackle the errors. In each iteration, it adopts Rounding Refinement that uses an empirically derived, efficient proxy to refine the rounding directions of quantized weights, complemented by a Ridge Regression solver to reduce the errors. Comprehensive experimental results demonstrate ERQ's superior performance across various ViTs variants and tasks. For example, ERQ surpasses the state-of-the-art GPTQ by a notable 36.81% in accuracy for W3A4 ViT-S. Our codes are available at https://github.com/zysxmu/ERQ.
Problem

Research questions and friction points this paper is trying to address.

Reduces quantization errors in Vision Transformers
Improves activation and weight quantization accuracy
Enhances performance of post-training quantization methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-step PTQ method
Reparameterization Initialization
Dual Uniform Quantization
πŸ”Ž Similar Papers
No similar papers found.
Yunshan Zhong
Yunshan Zhong
Hainan university
Jiawei Hu
Jiawei Hu
PhD Student, University of New South Wales
Mobile ComputingUbiquitous Computing
You Huang
You Huang
Xiamen University
segmentationinteractive segmentationtransformer
Y
Yuxin Zhang
Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University; Department of Artificial Intelligence, School of Informatics, Xiamen University
R
Rongrong Ji
Institute of Artificial Intelligence, Xiamen University; Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University; Department of Artificial Intelligence, School of Informatics, Xiamen University; Peng Cheng Laboratory