đ€ AI Summary
Prior work on CLIP quantization focuses predominantly on accuracy degradation, overlooking its impact on reliability aspectsâsuch as calibration quality and out-of-distribution (OOD) detection. Method: We conduct a systematic evaluation of diverse quantization strategiesâincluding quantization-aware training (QAT)âon CLIPâs reliability. We identify counterintuitive phenomena: quantization can improve calibration for underconfident models and even enhance OOD detection performance despite calibration deterioration. Building on these insights, we propose a tailored QAT framework explicitly optimizing for reliability. Contribution/Results: Extensive experiments demonstrate that quantization need not compromise reliability; under specific conditions, it simultaneously improves zero-shot classification accuracy, temperature-scaled expected calibration error (ECE), and OOD detection AUC. Our results challenge the conventional efficiencyâperformance trade-off assumption, establishing that quantizationâwhen properly designedâcan jointly enhance accuracy, calibration, and robustness in zero-shot visionâlanguage learning.
đ Abstract
The powerful zero-shot generalization capabilities of vision-language models (VLMs) like CLIP have enabled new paradigms for safety-related tasks such as out-of-distribution (OOD) detection. However, additional aspects crucial for the computationally efficient and reliable deployment of CLIP are still overlooked. In particular, the impact of quantization on CLIP's performance beyond accuracy remains underexplored. This work presents a large-scale evaluation of quantization on CLIP models, assessing not only in-distribution accuracy but a comprehensive suite of reliability metrics and revealing counterintuitive results driven by pre-training source. We demonstrate that quantization consistently improves calibration for typically underconfident pre-trained models, while often degrading it for overconfident variants. Intriguingly, this degradation in calibration does not preclude gains in other reliability metrics; we find that OOD detection can still improve for these same poorly calibrated models. Furthermore, we identify specific quantization-aware training (QAT) methods that yield simultaneous gains in zero-shot accuracy, calibration, and OOD robustness, challenging the view of a strict efficiency-performance trade-off. These findings offer critical insights for navigating the multi-objective problem of deploying efficient, reliable, and robust VLMs by utilizing quantization beyond its conventional role.