CSC: Turning the Adversary's Poison against Itself

📅 2026-04-23

📈 Citations: 0

✨ Influential: 0

career value

230K/year

🤖 AI Summary

This work addresses the challenge of detecting backdoor poisoning attacks while preserving model performance, a limitation of many existing defenses. The authors propose Cluster Segregation Concealment (CSC), a defense framework that exploits the observation that poisoned samples form isolated clusters in the feature space during early training. CSC employs DBSCAN clustering to identify such anomalous clusters and isolates trigger-containing samples using criteria based on class diversity and cluster density. These samples are then relabeled into a virtual class and used for fine-tuning, effectively severing the backdoor association. Evaluated across four benchmark datasets against twelve diverse attacks, CSC reduces the average attack success rate to near zero while incurring minimal clean accuracy loss, significantly outperforming nine state-of-the-art defense methods.

Technology Category

Application Category

📝 Abstract

Poisoning-based backdoor attacks pose significant threats to deep neural networks by embedding triggers in training data, causing models to misclassify triggered inputs as adversary-specified labels while maintaining performance on clean data. Existing poison restraint-based defenses often suffer from inadequate detection against specific attack variants and compromise model utility through unlearning methods that lead to accuracy degradation. This paper conducts a comprehensive analysis of backdoor attack dynamics during model training, revealing that poisoned samples form isolated clusters in latent space early on, with triggers acting as dominant features distinct from benign ones. Leveraging these insights, we propose Cluster Segregation Concealment (CSC), a novel poison suppression defense. CSC first trains a deep neural network via standard supervised learning while segregating poisoned samples through feature extraction from early epochs, DBSCAN clustering, and identification of anomalous clusters based on class diversity and density metrics. In the concealment stage, identified poisoned samples are relabeled to a virtual class, and the model's classifier is fine-tuned using cross-entropy loss to replace the backdoor association with a benign virtual linkage, preserving overall accuracy. CSC was evaluated on four benchmark datasets against twelve poisoning-based attacks, CSC outperforms nine state-of-the-art defenses by reducing average attack success rates to near zero with minimal clean accuracy loss. Contributions include robust backdoor patterns identification, an effective concealment mechanism, and superior empirical validation, advancing trustworthy artificial intelligence.

Problem

Research questions and friction points this paper is trying to address.

backdoor attack

poisoning-based attack

deep neural networks

adversarial defense

trigger

Innovation

Methods, ideas, or system contributions that make the work stand out.

backdoor defense

poisoning attack

cluster segregation