AnyAttack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models

📅 2024-10-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing vision-language models (VLMs) are vulnerable to image-based adversarial attacks; however, conventional targeted attacks rely on prior class labels, limiting their generalizability and practical applicability. To address this, we propose AnyAttack—the first large-scale self-supervised adversarial attack framework that requires neither ground-truth labels nor target categories. It leverages unsupervised pretraining on LAION-400M and contrastive learning–driven visual-language alignment to optimize label-agnostic, gradient-free universal perturbations. AnyAttack introduces a novel foundation-model–based self-supervised adversarial pretraining paradigm, enabling zero-shot black-box transfer attacks across diverse VLMs—including open-source and commercial systems (e.g., Gemini, GPT, Claude, Copilot)—as well as across models and tasks. Evaluated on five open-source VLMs, it achieves an average attack success rate exceeding 82%, and successfully exposes systematic security vulnerabilities in leading commercial VLMs.

Technology Category

Application Category

📝 Abstract
Due to their multimodal capabilities, Vision-Language Models (VLMs) have found numerous impactful applications in real-world scenarios. However, recent studies have revealed that VLMs are vulnerable to image-based adversarial attacks. Traditional targeted adversarial attacks require specific targets and labels, limiting their real-world impact.We present AnyAttack, a self-supervised framework that transcends the limitations of conventional attacks through a novel foundation model approach. By pre-training on the massive LAION-400M dataset without label supervision, AnyAttack achieves unprecedented flexibility - enabling any image to be transformed into an attack vector targeting any desired output across different VLMs.This approach fundamentally changes the threat landscape, making adversarial capabilities accessible at an unprecedented scale. Our extensive validation across five open-source VLMs (CLIP, BLIP, BLIP2, InstructBLIP, and MiniGPT-4) demonstrates AnyAttack's effectiveness across diverse multimodal tasks. Most concerning, AnyAttack seamlessly transfers to commercial systems including Google Gemini, Claude Sonnet, Microsoft Copilot and OpenAI GPT, revealing a systemic vulnerability requiring immediate attention.
Problem

Research questions and friction points this paper is trying to address.

Self-supervised adversarial attacks on vision-language models
Large-scale vulnerability across diverse multimodal tasks
Transferability to commercial systems without label supervision
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised adversarial attack framework
Pre-trained on LAION-400M without labels
Transfers to commercial VLMs effectively
🔎 Similar Papers
No similar papers found.
J
Jiaming Zhang
Hong Kong University of Science and Technology
J
Junhong Ye
Beijing Jiaotong University
Xingjun Ma
Xingjun Ma
Fudan University
Trustworthy AIMultimodal AIGenerative AIEmbodied AI
Yige Li
Yige Li
Singapore Management University
Trustworthy Machine Learning
Y
Yunfan Yang
Beijing Jiaotong University
J
Jitao Sang
Beijing Jiaotong University
Dit-Yan Yeung
Dit-Yan Yeung
Chair Professor, Department of CSE, HKUST, Hong Kong
Machine LearningArtificial IntelligenceComputer Vision