Understanding vision transformer robustness through the lens of out-of-distribution detection

📅 2026-02-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the impact of low-bit quantization on out-of-distribution (OOD) detection performance in vision transformers, revealing a significant degradation—particularly under 4-bit settings. The authors systematically evaluate small-scale models including DeiT, DeiT3, and ViT, pretrained on either ImageNet-1k or ImageNet-22k, using metrics such as AUPR-out to assess OOD robustness post-quantization. Notably, the work demonstrates for the first time from an OOD perspective that large-scale pretraining on ImageNet-22k exacerbates the decline in OOD robustness after quantization, with AUPR-out dropping by 15.0%–19.2%, markedly higher than the 9.5%–12.0% reduction observed in ImageNet-1k-pretrained counterparts. Furthermore, the study highlights that data augmentation strategies are more effective than merely scaling up pretraining data for enhancing OOD robustness in quantized models.

Technology Category

Application Category

📝 Abstract
Vision transformers have shown remarkable performance in vision tasks, but enabling them for accessible and real-time use is still challenging. Quantization reduces memory and inference costs at the risk of performance loss. Strides have been made to mitigate low precision issues mainly by understanding in-distribution (ID) task behaviour, but the attention mechanism may provide insight on quantization attributes by exploring out-of-distribution (OOD) situations. We investigate the behaviour of quantized small-variant popular vision transformers (DeiT, DeiT3, and ViT) on common OOD datasets. ID analyses show the initial instabilities of 4-bit models, particularly of those trained on the larger ImageNet-22k, as the strongest FP32 model, DeiT3, sharply drop 17% from quantization error to be one of the weakest 4-bit models. While ViT shows reasonable quantization robustness for ID calibration, OOD detection reveals more: ViT and DeiT3 pretrained on ImageNet-22k respectively experienced a 15.0% and 19.2% average quantization delta in AUPR-out between full precision to 4-bit while their ImageNet-1k-only counterparts experienced a 9.5% and 12.0% delta. Overall, our results suggest pretraining on large scale datasets may hinder low-bit quantization robustness in OOD detection and that data augmentation may be a more beneficial option.
Problem

Research questions and friction points this paper is trying to address.

vision transformer
quantization robustness
out-of-distribution detection
low-bit quantization
pretraining dataset
Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision Transformer
Quantization Robustness
Out-of-Distribution Detection
Low-bit Quantization
Pretraining Scale
🔎 Similar Papers
No similar papers found.
J
Joey Kuang
Vision and Image Processing Research Group, Department of Systems Design Engineering, University of Waterloo, Ontario, Canada
Alexander Wong
Alexander Wong
Canada Research Chair FIET FInstP FRSPH FRSM FRGS FGS FRSA FISDDE, University of Waterloo
Artificial IntelligenceMachine LearningImage ProcessingComputer VisionMedical Imaging