Robust Defense Strategies for Multimodal Contrastive Learning: Efficient Fine-tuning Against Backdoor Attacks

📅 2025-11-17

📈 Citations: 0

✨ Influential: 0

career value

235K/year

🤖 AI Summary

Multimodal contrastive learning models (e.g., CLIP) are vulnerable to backdoor attacks, and existing defenses require either full retraining or large-scale fine-tuning data, hindering precise identification of compromised labels and poisoned samples. To address this, we propose the first lightweight, retraining-free backdoor repair framework. Leveraging a segmentation “oracle” for fine-grained supervision and analyzing cross-modal output discrepancies in CLIP, our method jointly identifies triggers, localizes victim labels, and detects poisoned samples. It then constructs a compact fine-tuning dataset—requiring only a few samples—for accurate model repair. Experiments on standard vision benchmarks demonstrate that our approach substantially suppresses backdoor behavior (ASR reduced by 92%) while preserving original task performance (accuracy drop <0.5%). This work achieves, for the first time, efficient, interpretable, and low-overhead backdoor localization and repair in multimodal models.

Technology Category

Application Category

📝 Abstract

The advent of multimodal deep learning models, such as CLIP, has unlocked new frontiers in a wide range of applications, from image-text understanding to classification tasks. However, these models are not safe for adversarial attacks, particularly backdoor attacks, which can subtly manipulate model behavior. Moreover, existing defense methods typically involve training from scratch or fine-tuning using a large dataset without pinpointing the specific labels that are affected. In this study, we introduce an innovative strategy to enhance the robustness of multimodal contrastive learning models against such attacks. In particular, given a poisoned CLIP model, our approach can identify the backdoor trigger and pinpoint the victim samples and labels in an efficient manner. To that end, an image segmentation ``oracle'' is introduced as the supervisor for the output of the poisoned CLIP. We develop two algorithms to rectify the poisoned model: (1) differentiating between CLIP and Oracle's knowledge to identify potential triggers; (2) pinpointing affected labels and victim samples, and curating a compact fine-tuning dataset. With this knowledge, we are allowed to rectify the poisoned CLIP model to negate backdoor effects. Extensive experiments on visual recognition benchmarks demonstrate our strategy is effective in CLIP-based backdoor defense.

Problem

Research questions and friction points this paper is trying to address.

Developing robust defense strategies for multimodal contrastive learning models

Efficiently identifying backdoor triggers and affected labels in poisoned CLIP

Rectifying poisoned models using image segmentation oracle supervision

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces image segmentation oracle as supervisor

Develops algorithms to identify triggers and victim samples

Uses compact dataset for efficient fine-tuning

🔎 Similar Papers

Exploring Transferability of Multimodal Adversarial Samples for Vision-Language Pre-training Models with Contrastive Learning