Backdoor Mitigation via Invertible Pruning Masks

📅 2025-09-18

📈 Citations: 0

✨ Influential: 0

career value

256K/year

🤖 AI Summary

Existing pruning-based defenses struggle to precisely identify and remove parameters critical to backdoor attacks. This paper proposes a reversible pruning mask framework that jointly optimizes a learnable selection mechanism and a sample-adaptive mask structure within a bilevel optimization paradigm, enabling accurate localization, suppression, and reversible recovery of backdoor-related parameters. The method introduces clean-data-driven inverse-mask trigger synthesis and sample-specific backdoor perturbation generation, balancing model sparsity, primary-task accuracy, and security. It maintains high robustness even under low-data regimes. Experiments across multiple benchmarks demonstrate substantial improvements over state-of-the-art pruning-based defenses: the recovery rate of poisoned-sample predictions approaches that of optimal fine-tuning, and—crucially—this work achieves, for the first time, *reversible backdoor removal* within the pruning paradigm.

Technology Category

Application Category

📝 Abstract

Model pruning has gained traction as a promising defense strategy against backdoor attacks in deep learning. However, existing pruning-based approaches often fall short in accurately identifying and removing the specific parameters responsible for inducing backdoor behaviors. Despite the dominance of fine-tuning-based defenses in recent literature, largely due to their superior performance, pruning remains a compelling alternative, offering greater interpretability and improved robustness in low-data regimes. In this paper, we propose a novel pruning approach featuring a learned emph{selection} mechanism to identify parameters critical to both main and backdoor tasks, along with an emph{invertible} pruning mask designed to simultaneously achieve two complementary goals: eliminating the backdoor task while preserving it through the inverse mask. We formulate this as a bi-level optimization problem that jointly learns selection variables, a sparse invertible mask, and sample-specific backdoor perturbations derived from clean data. The inner problem synthesizes candidate triggers using the inverse mask, while the outer problem refines the mask to suppress backdoor behavior without impairing clean-task accuracy. Extensive experiments demonstrate that our approach outperforms existing pruning-based backdoor mitigation approaches, maintains strong performance under limited data conditions, and achieves competitive results compared to state-of-the-art fine-tuning approaches. Notably, the proposed approach is particularly effective in restoring correct predictions for compromised samples after successful backdoor mitigation.

Problem

Research questions and friction points this paper is trying to address.

Accurately identifying backdoor-inducing parameters via pruning

Simultaneously eliminating backdoor while preserving clean task performance

Maintaining robustness under limited data conditions through invertible masks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Learned selection mechanism for parameter identification

Invertible pruning mask for dual-task management

Bi-level optimization for trigger synthesis and suppression

🔎 Similar Papers

Unified Neural Backdoor Removal with Only Few Clean Samples through Unlearning and Relearning