CLIP is Strong Enough to Fight Back: Test-time Counterattacks towards Zero-shot Adversarial Robustness of CLIP

📅 2025-03-05

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

CLIP exhibits vulnerability to adversarial perturbations in zero-shot image–text matching, and existing defenses typically rely on additional training or external models. Method: We propose the first training-free, test-time-only robust defense framework for zero-shot CLIP. It leverages gradients from CLIP’s pretrained visual encoder to dynamically generate and iteratively optimize *anti-perturbations*, thereby purifying adversarial images online—without introducing new parameters or auxiliary networks. Contribution/Results: We identify and exploit the “spurious stability” phenomenon to enhance optimization convergence reliability. Our method is orthogonal to fine-tuning-based defenses and further improves robustness of already-hardened models. Evaluated on 16 standard classification benchmarks, it significantly outperforms all external-network-free test-time defenses, achieving consistent robustness gains while preserving original accuracy on clean samples—zero degradation.

Technology Category

Application Category

📝 Abstract

Despite its prevalent use in image-text matching tasks in a zero-shot manner, CLIP has been shown to be highly vulnerable to adversarial perturbations added onto images. Recent studies propose to finetune the vision encoder of CLIP with adversarial samples generated on the fly, and show improved robustness against adversarial attacks on a spectrum of downstream datasets, a property termed as zero-shot robustness. In this paper, we show that malicious perturbations that seek to maximise the classification loss lead to `falsely stable' images, and propose to leverage the pre-trained vision encoder of CLIP to counterattack such adversarial images during inference to achieve robustness. Our paradigm is simple and training-free, providing the first method to defend CLIP from adversarial attacks at test time, which is orthogonal to existing methods aiming to boost zero-shot adversarial robustness of CLIP. We conduct experiments across 16 classification datasets, and demonstrate stable and consistent gains compared to test-time defence methods adapted from existing adversarial robustness studies that do not rely on external networks, without noticeably impairing performance on clean images. We also show that our paradigm can be employed on CLIP models that have been adversarially finetuned to further enhance their robustness at test time. Our code is available href{https://github.com/Sxing2/CLIP-Test-time-Counterattacks}{here}.

Problem

Research questions and friction points this paper is trying to address.

CLIP vulnerable to adversarial image perturbations

Propose test-time counterattacks for CLIP robustness

Enhance CLIP robustness without external networks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages pre-trained CLIP vision encoder

Implements test-time counterattacks for robustness

Training-free defense against adversarial images

🔎 Similar Papers

No similar papers found.