Defending Deep Neural Networks against Backdoor Attacks via Module Switching

📅 2025-04-08

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

Open-source deep neural networks are vulnerable to backdoor attacks, and existing defenses—such as weight averaging—struggle to eliminate spurious parameter correlations embedded during poisoning. Method: This paper proposes a retraining-free, module-level dynamic switching defense that actively disrupts backdoor triggering logic at the propagation-structure level. Leveraging evolutionary algorithms, it optimizes cross-layer module fusion paths to structurally dismantle entrenched false associations within the model. Contribution/Results: Evaluated on dual-domain benchmarks (SST-2 for text and standard vision datasets), the method reduces average attack success rate to 22.0%, significantly outperforming the best baseline (31.9%). It demonstrates strong robustness against multi-poisoned models. As the first work to introduce module switching into backdoor defense, it achieves structured, interpretable, and retraining-free protection—offering both theoretical insight and practical efficacy.

Technology Category

Application Category

📝 Abstract

The exponential increase in the parameters of Deep Neural Networks (DNNs) has significantly raised the cost of independent training, particularly for resource-constrained entities. As a result, there is a growing reliance on open-source models. However, the opacity of training processes exacerbates security risks, making these models more vulnerable to malicious threats, such as backdoor attacks, while simultaneously complicating defense mechanisms. Merging homogeneous models has gained attention as a cost-effective post-training defense. However, we notice that existing strategies, such as weight averaging, only partially mitigate the influence of poisoned parameters and remain ineffective in disrupting the pervasive spurious correlations embedded across model parameters. We propose a novel module-switching strategy to break such spurious correlations within the model's propagation path. By leveraging evolutionary algorithms to optimize fusion strategies, we validate our approach against backdoor attacks targeting text and vision domains. Our method achieves effective backdoor mitigation even when incorporating a couple of compromised models, e.g., reducing the average attack success rate (ASR) to 22% compared to 31.9% with the best-performing baseline on SST-2.

Problem

Research questions and friction points this paper is trying to address.

Defending DNNs against backdoor attacks via module switching

Mitigating spurious correlations in model parameters

Reducing attack success rate in compromised models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Module-switching strategy breaks spurious correlations

Evolutionary algorithms optimize fusion strategies

Effective backdoor mitigation in compromised models

🔎 Similar Papers

Unified Neural Backdoor Removal with Only Few Clean Samples through Unlearning and Relearning