Invisible Backdoor Attack against Self-supervised Learning

📅 2024-05-23

📈 Citations: 2

✨ Influential: 0

career value

214K/year

🤖 AI Summary

Existing self-supervised learning (SSL) backdoor attacks rely on visible triggers (e.g., color patches or noise), which are easily detectable by human inspection and often fail due to trigger–clean-sample distribution overlap induced by SSL data augmentations. Method: This work is the first to theoretically characterize this distribution overlap mechanism and proposes a gradient-based trigger generation method decoupled from SSL augmentation transformations. By integrating human visual perception modeling with distribution-decoupling constraints, the method produces triggers imperceptible to the human visual system while inherently resisting augmentation-induced interference. Contribution/Results: Evaluated across five benchmarks and six SSL algorithms, our approach achieves an average attack success rate exceeding 92%, significantly outperforming prior invisible backdoor attacks. Moreover, the generated triggers exhibit strong robustness against mainstream backdoor defenses, achieving an unprecedented balance between stealthiness and effectiveness in SSL backdoor injection.

Technology Category

Application Category

📝 Abstract

Self-supervised learning (SSL) models are vulnerable to backdoor attacks. Existing backdoor attacks that are effective in SSL often involve noticeable triggers, like colored patches or visible noise, which are vulnerable to human inspection. This paper proposes an imperceptible and effective backdoor attack against self-supervised models. We first find that existing imperceptible triggers designed for supervised learning are less effective in compromising self-supervised models. We then identify this ineffectiveness is attributed to the overlap in distributions between the backdoor and augmented samples used in SSL. Building on this insight, we design an attack using optimized triggers disentangled with the augmented transformation in the SSL, while remaining imperceptible to human vision. Experiments on five datasets and six SSL algorithms demonstrate our attack is highly effective and stealthy. It also has strong resistance to existing backdoor defenses. Our code can be found at https://github.com/Zhang-Henry/INACTIVE.

Problem

Research questions and friction points this paper is trying to address.

SSL models vulnerable to stealthy backdoor attacks

Existing triggers ineffective due to SSL augmentation overlap

Proposing imperceptible attack resistant to defenses

Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimized triggers for SSL backdoor attacks

Disentangled triggers from augmented transformations

Imperceptible and resistant to defenses

🔎 Similar Papers

Unified Neural Backdoor Removal with Only Few Clean Samples through Unlearning and Relearning