One Surrogate to Fool Them All: Universal, Transferable, and Targeted Adversarial Attacks with CLIP

📅 2025-05-26

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

Addressing the realistic scenario where target model training data is inaccessible and query access is severely restricted (e.g., due to detection mechanisms), this paper proposes UnivIntruder—the first framework enabling universal, targeted, and transferable zero-query adversarial attacks using only publicly available CLIP-based multimodal models and textual concepts. Methodologically, it leverages CLIP’s cross-modal alignment to model semantic relationships, integrates text-guided gradient optimization with universal perturbation synthesis, and incorporates adversarial transfer enhancement. Evaluated on ImageNet and CIFAR-10, UnivIntruder achieves attack success rates exceeding 85% and 99%, respectively. It further transfers successfully to real-world black-box systems—including Google and Baidu image search (ASR ≤ 84%) and large vision-language models such as GPT-4 and Claude-3.5 (ASR ≤ 80%). Its core contribution is the first adversarial attack paradigm that is simultaneously universal, targeted, and transferable—requiring no access to the target model, no training data, and zero queries.

Technology Category

Application Category

📝 Abstract

Deep Neural Networks (DNNs) have achieved widespread success yet remain prone to adversarial attacks. Typically, such attacks either involve frequent queries to the target model or rely on surrogate models closely mirroring the target model -- often trained with subsets of the target model's training data -- to achieve high attack success rates through transferability. However, in realistic scenarios where training data is inaccessible and excessive queries can raise alarms, crafting adversarial examples becomes more challenging. In this paper, we present UnivIntruder, a novel attack framework that relies solely on a single, publicly available CLIP model and publicly available datasets. By using textual concepts, UnivIntruder generates universal, transferable, and targeted adversarial perturbations that mislead DNNs into misclassifying inputs into adversary-specified classes defined by textual concepts. Our extensive experiments show that our approach achieves an Attack Success Rate (ASR) of up to 85% on ImageNet and over 99% on CIFAR-10, significantly outperforming existing transfer-based methods. Additionally, we reveal real-world vulnerabilities, showing that even without querying target models, UnivIntruder compromises image search engines like Google and Baidu with ASR rates up to 84%, and vision language models like GPT-4 and Claude-3.5 with ASR rates up to 80%. These findings underscore the practicality of our attack in scenarios where traditional avenues are blocked, highlighting the need to reevaluate security paradigms in AI applications.

Problem

Research questions and friction points this paper is trying to address.

Generating universal adversarial attacks using a single CLIP model

Achieving high attack success without target model queries

Compromising real-world systems like image search engines

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses single CLIP model for attacks

Generates universal adversarial perturbations

Targets DNNs without querying models

🔎 Similar Papers

Prompt-Driven Contrastive Learning for Transferable Adversarial Attacks