Adversarially Robust CLIP Models Can Induce Better (Robust) Perceptual Metrics

📅 2025-02-17
📈 Citations: 3
Influential: 1
📄 PDF
🤖 AI Summary
Neural perceptual metrics like CLIP exhibit insufficient robustness against adversarial attacks, limiting their reliability in safety-critical zero-shot evaluation tasks. Method: We propose R-CLIP$_ extrm{F}$, an unsupervised adversarial fine-tuning framework that jointly optimizes CLIP’s feature space for robustness and performs self-supervised adaptation under adversarial perturbations—without requiring labeled data. It further introduces feature and text inversion mechanisms to enhance interpretability and enable visual concept visualization. Contribution/Results: R-CLIP$_ extrm{F}$ is the first method achieving both high robustness and high discriminability in zero-shot perceptual similarity modeling. Experiments demonstrate its superior performance over state-of-the-art metrics on zero-shot perceptual assessment, robust vision–language retrieval, and NSFW content detection. Crucially, it maintains high accuracy under adversarial perturbations while preserving original-image performance—bridging the robustness–accuracy trade-off without supervision.

Technology Category

Application Category

📝 Abstract
Measuring perceptual similarity is a key tool in computer vision. In recent years perceptual metrics based on features extracted from neural networks with large and diverse training sets, e.g. CLIP, have become popular. At the same time, the metrics extracted from features of neural networks are not adversarially robust. In this paper we show that adversarially robust CLIP models, called R-CLIP$_ extrm{F}$, obtained by unsupervised adversarial fine-tuning induce a better and adversarially robust perceptual metric that outperforms existing metrics in a zero-shot setting, and further matches the performance of state-of-the-art metrics while being robust after fine-tuning. Moreover, our perceptual metric achieves strong performance on related tasks such as robust image-to-image retrieval, which becomes especially relevant when applied to"Not Safe for Work"(NSFW) content detection and dataset filtering. While standard perceptual metrics can be easily attacked by a small perturbation completely degrading NSFW detection, our robust perceptual metric maintains high accuracy under an attack while having similar performance for unperturbed images. Finally, perceptual metrics induced by robust CLIP models have higher interpretability: feature inversion can show which images are considered similar, while text inversion can find what images are associated to a given prompt. This also allows us to visualize the very rich visual concepts learned by a CLIP model, including memorized persons, paintings and complex queries.
Problem

Research questions and friction points this paper is trying to address.

Enhance adversarial robustness in perceptual metrics.
Improve zero-shot performance of CLIP-based metrics.
Maintain high accuracy in NSFW content detection.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adversarially robust CLIP models
Unsupervised adversarial fine-tuning
High interpretability feature inversion
🔎 Similar Papers
No similar papers found.