Backdoor Defense in Diffusion Models via Spatial Attention Unlearning

📅 2025-04-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Text-to-image diffusion models are vulnerable to backdoor attacks, yet existing defenses struggle to precisely identify and erase implicit trigger representations in high-dimensional generative latent spaces. To address this, we propose Spatial Attention-based Unlearning (SAU), the first method that integrates spatial attention mechanisms with latent-space unlearning to localize and remove implicit trigger representations within the diffusion model’s latent space. SAU supports diverse trigger types—including pixel-level perturbations and style-based triggers—without modifying the model architecture. Leveraging CLIP-guided evaluation and reverse gradient suppression, it achieves lossless semantic unlearning of backdoors. Our method attains 100% trigger removal accuracy and a CLIP similarity score of 0.7023, significantly outperforming baseline defenses, while preserving both image fidelity and text–image alignment performance.

Technology Category

Application Category

📝 Abstract
Text-to-image diffusion models are increasingly vulnerable to backdoor attacks, where malicious modifications to the training data cause the model to generate unintended outputs when specific triggers are present. While classification models have seen extensive development of defense mechanisms, generative models remain largely unprotected due to their high-dimensional output space, which complicates the detection and mitigation of subtle perturbations. Defense strategies for diffusion models, in particular, remain under-explored. In this work, we propose Spatial Attention Unlearning (SAU), a novel technique for mitigating backdoor attacks in diffusion models. SAU leverages latent space manipulation and spatial attention mechanisms to isolate and remove the latent representation of backdoor triggers, ensuring precise and efficient removal of malicious effects. We evaluate SAU across various types of backdoor attacks, including pixel-based and style-based triggers, and demonstrate its effectiveness in achieving 100% trigger removal accuracy. Furthermore, SAU achieves a CLIP score of 0.7023, outperforming existing methods while preserving the model's ability to generate high-quality, semantically aligned images. Our results show that SAU is a robust, scalable, and practical solution for securing text-to-image diffusion models against backdoor attacks.
Problem

Research questions and friction points this paper is trying to address.

Defending diffusion models against backdoor attacks
Removing malicious triggers via spatial attention
Preserving image quality while eliminating backdoors
Innovation

Methods, ideas, or system contributions that make the work stand out.

Spatial Attention Unlearning removes backdoor triggers
Leverages latent space and attention mechanisms
Achieves 100% trigger removal with high CLIP score
🔎 Similar Papers
No similar papers found.