🤖 AI Summary
Deep neural networks (DNNs) are highly vulnerable to backdoor attacks—including sophisticated variants such as clean-label and clean-image attacks—while existing defenses suffer from high computational overhead and insufficient robustness. To address this, we propose an efficient backdoor defense framework. Our approach is the first to leverage a publicly available CLIP model, exploiting its cross-modal semantic understanding to detect potentially poisoned samples. We further integrate entropy-based analysis for unsupervised separation of contaminated data and employ logits-guided lightweight retraining to eliminate backdoors. Evaluated across four benchmark datasets under eleven diverse attacks, our method reduces attack success rates to below 1%, with clean accuracy degradation no greater than 0.3%. It significantly outperforms state-of-the-art defenses, achieving superior robustness, low computational cost, and strong generalization capability.
📝 Abstract
Deep Neural Networks (DNNs) are susceptible to backdoor attacks, where adversaries poison training data to implant backdoor into the victim model. Current backdoor defenses on poisoned data often suffer from high computational costs or low effectiveness against advanced attacks like clean-label and clean-image backdoors. To address them, we introduce CLIP-Guided backdoor Defense (CGD), an efficient and effective method that mitigates various backdoor attacks. CGD utilizes a publicly accessible CLIP model to identify inputs that are likely to be clean or poisoned. It then retrains the model with these inputs, using CLIP's logits as a guidance to effectively neutralize the backdoor. Experiments on 4 datasets and 11 attack types demonstrate that CGD reduces attack success rates (ASRs) to below 1% while maintaining clean accuracy (CA) with a maximum drop of only 0.3%, outperforming existing defenses. Additionally, we show that clean-data-based defenses can be adapted to poisoned data using CGD. Also, CGD exhibits strong robustness, maintaining low ASRs even when employing a weaker CLIP model or when CLIP itself is compromised by a backdoor. These findings underscore CGD's exceptional efficiency, effectiveness, and applicability for real-world backdoor defense scenarios. Code: https://github.com/binyxu/CGD.