Prototype-Guided Robust Learning against Backdoor Attacks

📅 2025-09-03

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

To address the challenge of training robust models under backdoor-poisoned training data with limited clean validation samples, this paper proposes Prototype-Guided Robust Learning (PGRL). PGRL constructs class-wise prototype vectors using only a minimal number of benign samples and enforces alignment between model outputs and these prototypes, thereby suppressing backdoor memorization and enhancing intrinsic robustness. Unlike existing defenses, PGRL requires no prior knowledge of trigger patterns, assumes no high poisoning rate, and operates effectively with scarce clean data—achieving, for the first time, cross-architecture, cross-dataset, and adaptive-attack-resilient generalization. In comprehensive evaluations against eight state-of-the-art defense methods, PGRL achieves superior robust accuracy. Notably, it maintains high robustness even under strong adaptive attacks, demonstrating effectiveness in worst-case scenarios.

Technology Category

Application Category

📝 Abstract

Backdoor attacks poison the training data to embed a backdoor in the model, causing it to behave normally on legitimate inputs but maliciously when specific trigger signals appear. Training a benign model from a dataset poisoned by backdoor attacks is challenging. Existing works rely on various assumptions and can only defend against backdoor attacks with specific trigger signals, high poisoning ratios, or when the defender possesses a large, untainted, validation dataset. In this paper, we propose a defense called Prototype-Guided Robust Learning (PGRL), which overcomes all the aforementioned limitations, being robust against diverse backdoor attacks. Leveraging a tiny set of benign samples, PGRL generates prototype vectors to guide the training process. We compare our PGRL with 8 existing defenses, showing that it achieves superior robustness. We also demonstrate that PGRL generalizes well across various architectures, datasets, and advanced attacks. Finally, to evaluate our PGRL in the worst-case scenario, we perform an adaptive attack, where the attackers fully know the details of the defense.

Problem

Research questions and friction points this paper is trying to address.

Defending against diverse backdoor attacks in poisoned datasets

Overcoming limitations of existing defenses with specific assumptions

Training robust models using minimal benign samples as guidance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Prototype-guided training using benign samples

Robust against diverse backdoor attack types

Generalizes across architectures and datasets

🔎 Similar Papers

Mellivora Capensis: A Backdoor-Free Training Framework on the Poisoned Dataset without Auxiliary Data