🤖 AI Summary
Existing post-training quantization (PTQ) methods struggle to enable fully integer-only inference for nonlinear layers—particularly GELU and Softmax—in vision Transformers, often resorting to activation distribution tuning or partial quantization, thereby compromising either accuracy or efficiency. To address this, we propose IPTQ-ViT, the first retraining-free, fully integer PTQ framework for ViTs. First, we replace GELU with a low-degree polynomial approximation and Softmax with bit-shift-based operations, eliminating floating-point nonlinearities entirely. Second, we introduce a unified layer-wise metric that jointly considers quantization sensitivity, output perturbation, and computational overhead to adaptively select the optimal approximation per layer. Evaluated on image classification, IPTQ-ViT achieves an average +1.78% top-1 accuracy gain (up to +6.44%), and improves object detection by +1.0 mAP—matching quantization-aware training performance—while enabling efficient W8A8/W4A8 deployment.
📝 Abstract
Previous Quantization-Aware Training (QAT) methods for vision transformers rely on expensive retraining to recover accuracy loss in non-linear layer quantization, limiting their use in resource-constrained environments. In contrast, existing Post-Training Quantization (PTQ) methods either partially quantize non-linear functions or adjust activation distributions to maintain accuracy but fail to achieve fully integer-only inference. In this paper, we introduce IPTQ-ViT, a novel PTQ framework for fully integer-only vision transformers without retraining. We present approximation functions: a polynomial-based GELU optimized for vision data and a bit-shifting-based Softmax designed to improve approximation accuracy in PTQ. In addition, we propose a unified metric integrating quantization sensitivity, perturbation, and computational cost to select the optimal approximation function per activation layer. IPTQ-ViT outperforms previous PTQ methods, achieving up to 6.44%p (avg. 1.78%p) top-1 accuracy improvement for image classification, 1.0 mAP for object detection. IPTQ-ViT outperforms partial floating-point PTQ methods under W8A8 and W4A8, and achieves accuracy and latency comparable to integer-only QAT methods. We plan to release our code https://github.com/gihwan-kim/IPTQ-ViT.git.