CLIP-Guided Backdoor Defense through Entropy-Based Poisoned Dataset Separation

📅 2025-07-07

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

Deep neural networks (DNNs) are highly vulnerable to backdoor attacks—including sophisticated variants such as clean-label and clean-image attacks—while existing defenses suffer from high computational overhead and insufficient robustness. To address this, we propose an efficient backdoor defense framework. Our approach is the first to leverage a publicly available CLIP model, exploiting its cross-modal semantic understanding to detect potentially poisoned samples. We further integrate entropy-based analysis for unsupervised separation of contaminated data and employ logits-guided lightweight retraining to eliminate backdoors. Evaluated across four benchmark datasets under eleven diverse attacks, our method reduces attack success rates to below 1%, with clean accuracy degradation no greater than 0.3%. It significantly outperforms state-of-the-art defenses, achieving superior robustness, low computational cost, and strong generalization capability.

Technology Category

Application Category

📝 Abstract

Deep Neural Networks (DNNs) are susceptible to backdoor attacks, where adversaries poison training data to implant backdoor into the victim model. Current backdoor defenses on poisoned data often suffer from high computational costs or low effectiveness against advanced attacks like clean-label and clean-image backdoors. To address them, we introduce CLIP-Guided backdoor Defense (CGD), an efficient and effective method that mitigates various backdoor attacks. CGD utilizes a publicly accessible CLIP model to identify inputs that are likely to be clean or poisoned. It then retrains the model with these inputs, using CLIP's logits as a guidance to effectively neutralize the backdoor. Experiments on 4 datasets and 11 attack types demonstrate that CGD reduces attack success rates (ASRs) to below 1% while maintaining clean accuracy (CA) with a maximum drop of only 0.3%, outperforming existing defenses. Additionally, we show that clean-data-based defenses can be adapted to poisoned data using CGD. Also, CGD exhibits strong robustness, maintaining low ASRs even when employing a weaker CLIP model or when CLIP itself is compromised by a backdoor. These findings underscore CGD's exceptional efficiency, effectiveness, and applicability for real-world backdoor defense scenarios. Code: https://github.com/binyxu/CGD.

Problem

Research questions and friction points this paper is trying to address.

Defends DNNs against diverse backdoor attacks efficiently

Uses CLIP to separate poisoned and clean training data

Maintains high clean accuracy while neutralizing backdoors effectively

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses CLIP model to separate poisoned data

Retrains model with CLIP-guided logits

Maintains high accuracy and low attack rates

🔎 Similar Papers

Mellivora Capensis: A Backdoor-Free Training Framework on the Poisoned Dataset without Auxiliary Data