Contrastive Spectral Rectification: Test-Time Defense towards Zero-shot Adversarial Robustness of CLIP

📅 2026-01-27

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

This work addresses the vulnerability of vision-language models such as CLIP to adversarial attacks under zero-shot settings, where existing test-time defenses suffer from limited robustness, high latency, and poor generalization. The study is the first to reveal that spectral inconsistencies in adversarial examples stem from the model’s inherent spectral bias. Building on this insight, the authors propose an input-adaptive test-time defense that optimizes a spectral-guided contrastive objective to correct perturbations and realign inputs with the natural data manifold. Evaluated across 16 classification benchmarks, the method outperforms the current state-of-the-art defense by an average of 18.1% against AutoAttack, while offering strong robustness, low inference overhead, and cross-task generalizability.

Technology Category

Application Category

📝 Abstract

Vision-language models (VLMs) such as CLIP have demonstrated remarkable zero-shot generalization, yet remain highly vulnerable to adversarial examples (AEs). While test-time defenses are promising, existing methods fail to provide sufficient robustness against strong attacks and are often hampered by high inference latency and task-specific applicability. To address these limitations, we start by investigating the intrinsic properties of AEs, which reveals that AEs exhibit severe feature inconsistency under progressive frequency attenuation. We further attribute this to the model's inherent spectral bias. Leveraging this insight, we propose an efficient test-time defense named Contrastive Spectral Rectification (CSR). CSR optimizes a rectification perturbation to realign the input with the natural manifold under a spectral-guided contrastive objective, which is applied input-adaptively. Extensive experiments across 16 classification benchmarks demonstrate that CSR outperforms the SOTA by an average of 18.1% against strong AutoAttack with modest inference overhead. Furthermore, CSR exhibits broad applicability across diverse visual tasks. Code is available at https://github.com/Summu77/CSR.

Problem

Research questions and friction points this paper is trying to address.

adversarial robustness

zero-shot learning

test-time defense

vision-language models

CLIP

Innovation

Methods, ideas, or system contributions that make the work stand out.

Contrastive Spectral Rectification

test-time defense

zero-shot adversarial robustness

spectral bias

vision-language models

🔎 Similar Papers

No similar papers found.

Bosch Group

Hildesheim, NDS, DE

AI Research Scientist, VLM (vision language models)