🤖 AI Summary
To address the insufficient adversarial robustness of vision-language models (e.g., CLIP) in safety-critical applications, this paper proposes the Bimodal Adversarial Prompt Distillation (BAPD) framework—the first to integrate knowledge distillation into adversarial prompt tuning. BAPD jointly optimizes learnable prompts for both visual and textual modalities while distilling multimodal semantic alignment knowledge from a clean teacher model. Notably, we empirically demonstrate that even a non-robust teacher can effectively enhance the student’s generalization and adversarial robustness. By unifying adversarial training, prompt tuning, and cross-modal distillation, BAPD consistently outperforms existing adversarial prompt tuning (APT) methods across multiple benchmarks: it achieves an average 8.2% improvement in adversarial robustness and a 3.5% gain in clean accuracy.
📝 Abstract
Large pre-trained Vision-Language Models (VLMs) such as Contrastive Language-Image Pre-training (CLIP) have been shown to be susceptible to adversarial attacks, raising concerns about their deployment in safety-critical applications like autonomous driving and medical diagnosis. One promising approach for robustifying pre-trained VLMs is Adversarial Prompt Tuning (APT), which applies adversarial training during the process of prompt tuning. However, existing APT methods are mostly single-modal methods that design prompt(s) for only the visual or textual modality, limiting their effectiveness in either robustness or clean accuracy. In this work, we propose Adversarial Prompt Distillation (APD), a bimodal knowledge distillation framework that enhances APT by integrating it with multi-modal knowledge transfer. APD optimizes prompts for both visual and textual modalities while distilling knowledge from a clean pre-trained teacher CLIP model. Extensive experiments on multiple benchmark datasets demonstrate the superiority of our APD method over the current state-of-the-art APT methods in terms of both adversarial robustness and clean accuracy. The effectiveness of APD also validates the possibility of using a non-robust teacher to improve the generalization and robustness of fine-tuned VLMs.