Calibrating Uncertainty for Zero-Shot Adversarial CLIP

📅 2025-12-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
CLIP exhibits poor robustness and severe miscalibration under adversarial attacks in zero-shot classification: perturbations frequently cause both accuracy degradation and reduced uncertainty, leading to overconfidence. This reliability deficiency stems from the neglect of uncertainty calibration in existing adversarial fine-tuning methods. To address this, we propose the first unified framework optimizing both robustness and calibration for CLIP in zero-shot settings. Our approach reparameterizes CLIP’s outputs as a Dirichlet distribution, jointly modeling semantic structure and confidence. We further introduce a distribution-level uncertainty calibration mechanism based on Dirichlet distribution alignment—moving beyond conventional logit-level matching. Evaluated across multiple zero-shot benchmarks, our method reduces Expected Calibration Error (ECE) by over 40% without sacrificing accuracy and achieves state-of-the-art adversarial robustness.

Technology Category

Application Category

📝 Abstract
CLIP delivers strong zero-shot classification but remains highly vulnerable to adversarial attacks. Previous work of adversarial fine-tuning largely focuses on matching the predicted logits between clean and adversarial examples, which overlooks uncertainty calibration and may degrade the zero-shot generalization. A common expectation in reliable uncertainty estimation is that predictive uncertainty should increase as inputs become more difficult or shift away from the training distribution. However, we frequently observe the opposite in the adversarial setting: perturbations not only degrade accuracy but also suppress uncertainty, leading to severe miscalibration and unreliable over-confidence. This overlooked phenomenon highlights a critical reliability gap beyond robustness. To bridge this gap, we propose a novel adversarial fine-tuning objective for CLIP considering both prediction accuracy and uncertainty alignments. By reparameterizing the output of CLIP as the concentration parameter of a Dirichlet distribution, we propose a unified representation that captures relative semantic structure and the magnitude of predictive confidence. Our objective aligns these distributions holistically under perturbations, moving beyond single-logit anchoring and restoring calibrated uncertainty. Experiments on multiple zero-shot classification benchmarks demonstrate that our approach effectively restores calibrated uncertainty and achieves competitive adversarial robustness while maintaining clean accuracy.
Problem

Research questions and friction points this paper is trying to address.

Calibrates uncertainty for zero-shot adversarial CLIP attacks
Addresses over-confidence and miscalibration in adversarial settings
Aligns prediction accuracy and uncertainty under perturbations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adversarial fine-tuning with uncertainty calibration
Reparameterizing CLIP outputs as Dirichlet distribution
Aligning semantic structure and confidence under perturbations
🔎 Similar Papers
No similar papers found.
W
Wenjing Lu
RIKEN Center for Advanced Intelligence Project (RIKEN AIP)
Z
Zerui Tao
RIKEN Center for Advanced Intelligence Project (RIKEN AIP)
D
Dongping Zhang
RIKEN Center for Advanced Intelligence Project (RIKEN AIP)
Y
Yuning Qiu
RIKEN Center for Advanced Intelligence Project (RIKEN AIP)
Y
Yang Yang
Shanghai Jiao Tong University
Qibin Zhao
Qibin Zhao
RIKEN AIP
Machine LearningTensor DecompositionTensor Networks