Prototype Guided Backdoor Defense

📅 2025-03-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Deep learning models are vulnerable to backdoor attacks, particularly those employing semantic triggers (e.g., manipulated celebrity faces), which are notoriously difficult to detect and mitigate. To address this, we propose a plug-and-play robust defense method that operates during post-training fine-tuning. Our approach models geometric displacement of class prototype points in the activation space and introduces a novel purification loss function to suppress trigger-induced representation shifts. This is the first method to achieve effective defense against semantic backdoors, establishing a generalizable defense paradigm that requires no prior knowledge of trigger characteristics—thereby overcoming the trigger-type sensitivity inherent in existing approaches. Extensive experiments demonstrate state-of-the-art performance across diverse trigger types, including unseen semantic attacks, significantly enhancing model robustness against both known and unknown backdoor threats.

Technology Category

Application Category

📝 Abstract
Deep learning models are susceptible to {em backdoor attacks} involving malicious attackers perturbing a small subset of training data with a {em trigger} to causes misclassifications. Various triggers have been used, including semantic triggers that are easily realizable without requiring the attacker to manipulate the image. The emergence of generative AI has eased the generation of varied poisoned samples. Robustness across types of triggers is crucial to effective defense. We propose Prototype Guided Backdoor Defense (PGBD), a robust post-hoc defense that scales across different trigger types, including previously unsolved semantic triggers. PGBD exploits displacements in the geometric spaces of activations to penalize movements toward the trigger. This is done using a novel sanitization loss of a post-hoc fine-tuning step. The geometric approach scales easily to all types of attacks. PGBD achieves better performance across all settings. We also present the first defense against a new semantic attack on celebrity face images. Project page: hyperlink{https://venkatadithya9.github.io/pgbd.github.io/}{this https URL}.
Problem

Research questions and friction points this paper is trying to address.

Defends deep learning models against backdoor attacks
Addresses various trigger types including semantic triggers
Provides robust protection using geometric activation analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Prototype Guided Backdoor Defense (PGBD) method
Geometric space displacement analysis
Novel sanitization loss fine-tuning
🔎 Similar Papers
No similar papers found.