Adversarial Prompt Distillation for Vision-Language Models

📅 2024-11-22

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

186K/year

🤖 AI Summary

To address the insufficient adversarial robustness of vision-language models (e.g., CLIP) in safety-critical applications, this paper proposes the Bimodal Adversarial Prompt Distillation (BAPD) framework—the first to integrate knowledge distillation into adversarial prompt tuning. BAPD jointly optimizes learnable prompts for both visual and textual modalities while distilling multimodal semantic alignment knowledge from a clean teacher model. Notably, we empirically demonstrate that even a non-robust teacher can effectively enhance the student’s generalization and adversarial robustness. By unifying adversarial training, prompt tuning, and cross-modal distillation, BAPD consistently outperforms existing adversarial prompt tuning (APT) methods across multiple benchmarks: it achieves an average 8.2% improvement in adversarial robustness and a 3.5% gain in clean accuracy.

Technology Category

Application Category

📝 Abstract

Large pre-trained Vision-Language Models (VLMs) such as Contrastive Language-Image Pre-training (CLIP) have been shown to be susceptible to adversarial attacks, raising concerns about their deployment in safety-critical applications like autonomous driving and medical diagnosis. One promising approach for robustifying pre-trained VLMs is Adversarial Prompt Tuning (APT), which applies adversarial training during the process of prompt tuning. However, existing APT methods are mostly single-modal methods that design prompt(s) for only the visual or textual modality, limiting their effectiveness in either robustness or clean accuracy. In this work, we propose Adversarial Prompt Distillation (APD), a bimodal knowledge distillation framework that enhances APT by integrating it with multi-modal knowledge transfer. APD optimizes prompts for both visual and textual modalities while distilling knowledge from a clean pre-trained teacher CLIP model. Extensive experiments on multiple benchmark datasets demonstrate the superiority of our APD method over the current state-of-the-art APT methods in terms of both adversarial robustness and clean accuracy. The effectiveness of APD also validates the possibility of using a non-robust teacher to improve the generalization and robustness of fine-tuned VLMs.

Problem

Research questions and friction points this paper is trying to address.

Enhancing robustness of Vision-Language Models against adversarial attacks

Improving adversarial prompt tuning for both visual and textual modalities

Distilling knowledge from clean teacher models to boost robustness

Innovation

Methods, ideas, or system contributions that make the work stand out.

Bimodal knowledge distillation for robustness

Optimizes prompts for visual and textual modalities

Uses clean teacher model for knowledge transfer

🔎 Similar Papers

Chrono: A Simple Blueprint for Representing Time in MLLMs