Towards Robust Defense against Customization via Protective Perturbation Resistant to Diffusion-based Purification

📅 2025-09-17

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

Diffusion models (e.g., Stable Diffusion) pose significant security risks—including deepfakes and copyright infringement—due to their strong customization capabilities; existing protective perturbations are easily removed by denoising-based purification methods, rendering defenses ineffective. To address this, we formally define the “purification-resistant” protection task and propose a robust perturbation method featuring two synergistic mechanisms: patch-wise frequency-block guidance and erroneous timestep guidance. These jointly embed subtle, persistent perturbations into the diffusion process, enabling fine-grained control over the high-dimensional noise space. Experiments demonstrate that our method preserves imperceptible visual fidelity while substantially increasing customization distortion in purified images—effectively thwarting malicious manipulation. Moreover, it exposes critical security vulnerabilities in mainstream purification strategies when applied within customized generation pipelines, establishing a new benchmark for safeguarding generative image integrity.

Technology Category

Application Category

📝 Abstract

Diffusion models like Stable Diffusion have become prominent in visual synthesis tasks due to their powerful customization capabilities, which also introduce significant security risks, including deepfakes and copyright infringement. In response, a class of methods known as protective perturbation emerged, which mitigates image misuse by injecting imperceptible adversarial noise. However, purification can remove protective perturbations, thereby exposing images again to the risk of malicious forgery. In this work, we formalize the anti-purification task, highlighting challenges that hinder existing approaches, and propose a simple diagnostic protective perturbation named AntiPure. AntiPure exposes vulnerabilities of purification within the "purification-customization" workflow, owing to two guidance mechanisms: 1) Patch-wise Frequency Guidance, which reduces the model's influence over high-frequency components in the purified image, and 2) Erroneous Timestep Guidance, which disrupts the model's denoising strategy across different timesteps. With additional guidance, AntiPure embeds imperceptible perturbations that persist under representative purification settings, achieving effective post-customization distortion. Experiments show that, as a stress test for purification, AntiPure achieves minimal perceptual discrepancy and maximal distortion, outperforming other protective perturbation methods within the purification-customization workflow.

Problem

Research questions and friction points this paper is trying to address.

Defending against image misuse via protective adversarial perturbations

Overcoming diffusion-based purification that removes protective noise

Developing robust perturbations resistant to purification-customization workflow

Innovation

Methods, ideas, or system contributions that make the work stand out.

Patch-wise Frequency Guidance reduces high-frequency influence

Erroneous Timestep Guidance disrupts denoising strategy

AntiPure embeds persistent perturbations resistant to purification

🔎 Similar Papers

ADBM: Adversarial diffusion bridge model for reliable adversarial purification