🤖 AI Summary
This work addresses semi-structured sparse acceleration scenarios by proposing a “Silent-to-Sparse” backdoor attack, wherein the full model behaves benignly upon release (attack success rate <10%) but activates malicious functionality only after standard sparsification (success rate >99%).
Method: We design a two-stage mechanism: during training, the backdoor is embedded into preserved weights while prunable parameters undergo stealthy fine-tuning; during sparsification, semi-structured pruning triggers a behavioral phase transition. The attack is compatible with NVIDIA/PyTorch sparse acceleration frameworks, leveraging sparse matrix multiplication and fine-grained weight modulation.
Contribution/Results: Experiments demonstrate strong robustness against mainstream backdoor defenses and model fine-tuning. To our knowledge, this is the first backdoor paradigm that is both deployment-time dormant and sparsification-triggered—enabling covert activation solely through post-deployment sparse acceleration.
📝 Abstract
In the deployment phase, semi-structured sparsity accelerates the execution of deep neural networks on modern GPUs via sparse matrix multiplication. In this paper, targeting the semi-structured sparsity, we introduce a Silent Until Sparse (SUS) backdoor attack, where the released full model remains silent (benign), but becomes a backdoored model after sparsification. The attack operates in two phases: (i) in the backdoor training phase, the backdoor functionality is injected into specific weights that will be retained during the pruning process; (ii) in the backdoor hiding phase, the malicious behavior is concealed by fine-tuning elements that will be pruned away. This dual-phase approach ensures that the attack remains undetectable in the released model, but activates properly once the model is pruned with the semi-structured sparsity. Through extensive experiments, we show that our attack successfully threatens the semi-structured sparsity algorithms from both NVIDIA and PyTorch. Our empirical results show that, regardless of model architecture, the attack success rate of the released model remains below 10% prior to sparsification but exceeds 99% afterward. Moreover, we demonstrate that SUS attack is robust against state-of-the-art backdoor defenses and finetuning, highlighting a critical vulnerability in current model compression and deployment pipelines.