🤖 AI Summary
To address the lack of effective adversarial attack methods against pixel-domain diffusion models (PDMs), this paper proposes AtkPDM—the first systematic framework to break state-of-the-art PDM-based image editing models such as SDEdit. Methodologically, AtkPDM introduces a representation-level attack loss targeting the internal feature vulnerabilities of the denoising U-Net, jointly optimizing latent variables and perturbing intermediate U-Net features via end-to-end pixel-domain gradient backpropagation. Its contributions are threefold: (1) it establishes the first robust adversarial attack paradigm specifically designed for PDM-based editing tasks; (2) it achieves high attack success rates (>92%), strong image fidelity (PSNR > 28 dB), and natural visual quality; and (3) it demonstrates remarkable robustness against common defenses—including JPEG compression and denoising—and transfers effectively to latent diffusion models (LDMs), attaining state-of-the-art performance.
📝 Abstract
Diffusion Models have emerged as powerful generative models for high-quality image synthesis, with many subsequent image editing techniques based on them. However, the ease of text-based image editing introduces significant risks, such as malicious editing for scams or intellectual property infringement. Previous works have attempted to safeguard images from diffusion-based editing by adding imperceptible perturbations. These methods are costly and specifically target prevalent Latent Diffusion Models (LDMs), while Pixel-domain Diffusion Models (PDMs) remain largely unexplored and robust against such attacks. Our work addresses this gap by proposing a novel attack framework, AtkPDM. AtkPDM is mainly composed of a feature representation attacking loss that exploits vulnerabilities in denoising UNets and a latent optimization strategy to enhance the naturalness of adversarial images. Extensive experiments demonstrate the effectiveness of our approach in attacking dominant PDM-based editing methods (e.g., SDEdit) while maintaining reasonable fidelity and robustness against common defense methods. Additionally, our framework is extensible to LDMs, achieving comparable performance to existing approaches.