Silent Until Sparse: Backdoor Attacks on Semi-Structured Sparsity

📅 2025-09-03

📈 Citations: 0

✨ Influential: 0

career value

239K/year

🤖 AI Summary

This work addresses semi-structured sparse acceleration scenarios by proposing a “Silent-to-Sparse” backdoor attack, wherein the full model behaves benignly upon release (attack success rate <10%) but activates malicious functionality only after standard sparsification (success rate >99%). Method: We design a two-stage mechanism: during training, the backdoor is embedded into preserved weights while prunable parameters undergo stealthy fine-tuning; during sparsification, semi-structured pruning triggers a behavioral phase transition. The attack is compatible with NVIDIA/PyTorch sparse acceleration frameworks, leveraging sparse matrix multiplication and fine-grained weight modulation. Contribution/Results: Experiments demonstrate strong robustness against mainstream backdoor defenses and model fine-tuning. To our knowledge, this is the first backdoor paradigm that is both deployment-time dormant and sparsification-triggered—enabling covert activation solely through post-deployment sparse acceleration.

Technology Category

Application Category

📝 Abstract

In the deployment phase, semi-structured sparsity accelerates the execution of deep neural networks on modern GPUs via sparse matrix multiplication. In this paper, targeting the semi-structured sparsity, we introduce a Silent Until Sparse (SUS) backdoor attack, where the released full model remains silent (benign), but becomes a backdoored model after sparsification. The attack operates in two phases: (i) in the backdoor training phase, the backdoor functionality is injected into specific weights that will be retained during the pruning process; (ii) in the backdoor hiding phase, the malicious behavior is concealed by fine-tuning elements that will be pruned away. This dual-phase approach ensures that the attack remains undetectable in the released model, but activates properly once the model is pruned with the semi-structured sparsity. Through extensive experiments, we show that our attack successfully threatens the semi-structured sparsity algorithms from both NVIDIA and PyTorch. Our empirical results show that, regardless of model architecture, the attack success rate of the released model remains below 10% prior to sparsification but exceeds 99% afterward. Moreover, we demonstrate that SUS attack is robust against state-of-the-art backdoor defenses and finetuning, highlighting a critical vulnerability in current model compression and deployment pipelines.

Problem

Research questions and friction points this paper is trying to address.

Backdoor attacks targeting semi-structured sparsity in neural networks

Malicious behavior concealed until model pruning activates hidden functionality

Vulnerability in model compression pipelines bypassing existing defense mechanisms

Innovation

Methods, ideas, or system contributions that make the work stand out.

Backdoor attack targets semi-structured sparsity via dual-phase training

Conceals malicious behavior in weights retained after pruning

Activates post-sparsification with high success rate

🔎 Similar Papers

Can't Hide Behind the API: Stealing Black-Box Commercial Embedding Models