๐ค AI Summary
Existing self-supervised learning (SSL) backdoor attacks rely on visible triggers (e.g., color patches or noise), which are easily detectable by human inspection and often fail due to triggerโclean-sample distribution overlap induced by SSL data augmentations.
Method: This work is the first to theoretically characterize this distribution overlap mechanism and proposes a gradient-based trigger generation method decoupled from SSL augmentation transformations. By integrating human visual perception modeling with distribution-decoupling constraints, the method produces triggers imperceptible to the human visual system while inherently resisting augmentation-induced interference.
Contribution/Results: Evaluated across five benchmarks and six SSL algorithms, our approach achieves an average attack success rate exceeding 92%, significantly outperforming prior invisible backdoor attacks. Moreover, the generated triggers exhibit strong robustness against mainstream backdoor defenses, achieving an unprecedented balance between stealthiness and effectiveness in SSL backdoor injection.
๐ Abstract
Self-supervised learning (SSL) models are vulnerable to backdoor attacks. Existing backdoor attacks that are effective in SSL often involve noticeable triggers, like colored patches or visible noise, which are vulnerable to human inspection. This paper proposes an imperceptible and effective backdoor attack against self-supervised models. We first find that existing imperceptible triggers designed for supervised learning are less effective in compromising self-supervised models. We then identify this ineffectiveness is attributed to the overlap in distributions between the backdoor and augmented samples used in SSL. Building on this insight, we design an attack using optimized triggers disentangled with the augmented transformation in the SSL, while remaining imperceptible to human vision. Experiments on five datasets and six SSL algorithms demonstrate our attack is highly effective and stealthy. It also has strong resistance to existing backdoor defenses. Our code can be found at https://github.com/Zhang-Henry/INACTIVE.