Mutual Information Guided Backdoor Mitigation for Pre-Trained Encoders

๐Ÿ“… 2024-06-05
๐Ÿ›๏ธ IEEE Transactions on Information Forensics and Security
๐Ÿ“ˆ Citations: 10
โœจ Influential: 1
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Self-supervised learning (SSL) pre-trained encoders are vulnerable to backdoor attacks, yet existing mitigation methods rely on labeled dataโ€”rendering them inapplicable during unsupervised pre-training. This paper proposes the first label-free backdoor mitigation framework: it treats the potentially compromised encoder as a teacher and localizes benign knowledge regions via layer-wise feature mutual information estimation; employs a randomly initialized student network to prevent backdoor inheritance; and jointly optimizes cloning loss and attention loss to disentangle backdoors while preserving semantic representations. The method requires only โ‰ค5% clean pre-training data and achieves significant reductions in attack success rates across two major SSL backdoor attack types after lightweight fine-tuning. Evaluated against seven state-of-the-art baselines, it attains superior performance. Its core innovation lies in a mutual-information-guided selective knowledge distillation mechanism.

Technology Category

Application Category

๐Ÿ“ Abstract
Self-supervised learning (SSL) is increasingly attractive for pre-training encoders without requiring labeled data. Downstream tasks built on top of those pre-trained encoders can achieve nearly state-of-the-art performance. The pre-trained encoders by SSL, however, are vulnerable to backdoor attacks as demonstrated by existing studies. Numerous backdoor mitigation techniques are designed for downstream task models. However, their effectiveness is impaired and limited when adapted to pre-trained encoders, due to the lack of label information when pre-training. To address backdoor attacks against pre-trained encoders, in this paper, we innovatively propose a mutual information guided backdoor mitigation technique, named MIMIC(Mutual Information guided backdoor MItigation for pre-trained enCoders). MIMIC uses the potentially backdoored encoder as the teacher network and applies knowledge distillation to create a clean student encoder from it. Different from existing knowledge distillation approaches, MIMIC initializes the student with random weights, inheriting no backdoors from teacher nets. Then MIMIC leverages mutual information between each layer and extracted features to locate where benign knowledge lies in the teacher net, with which distillation is deployed to clone clean features from teacher to student. We craft the distillation loss with two aspects, including clone loss and attention loss, aiming to mitigate backdoors and maintain encoder performance at the same time. Our evaluation conducted on two backdoor attacks in SSL demonstrates that MIMIC can significantly reduce the attack success rate by only utilizing $leq 5$ % of clean pre-training data that is accessible to the defender, surpassing seven state-of-the-art backdoor mitigation techniques. The source code of MIMIC is available at https://github.com/wssun/MIMIC.
Problem

Research questions and friction points this paper is trying to address.

Mitigating backdoor attacks in self-supervised pre-trained encoders
Using mutual information to locate and distill clean knowledge
Reducing attack success rates with minimal clean data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mutual information guided backdoor mitigation for encoders
Knowledge distillation with random initialization to avoid backdoors
Layer-wise mutual information locates benign features for distillation
T
Tingxu Han
State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China
Weisong Sun
Weisong Sun
Nanyang Technological University
Trustworthy Intelligent SE (Software Engineering)
Ziqi Ding
Ziqi Ding
UNSW Sydney
CAPTCHAUsabilityCognitive Science
Chunrong Fang
Chunrong Fang
Software Institute, Nanjing University
Software TestingSoftware EngineeringComputer Science
H
Hanwei Qian
State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China
Jiaxun Li
Jiaxun Li
Ph.D. of Statistics, University of Michigan
StatisticsLearning Theory
Z
Zhenyu Chen
State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China
X
Xiangyu Zhang