🤖 AI Summary
This work addresses the growing risk of diffusion model–based image editing being misused for deepfakes and unauthorized image manipulation. Existing immunization methods rely on per-image optimization, limiting their scalability. To overcome this, we propose the first universal image immunization framework that injects a unified semantic adversarial perturbation into images. This perturbation preserves visual fidelity while disrupting the diffusion model’s understanding of the original semantics, thereby effectively preventing malicious edits. Our approach requires no per-image optimization, is data-free, and exhibits strong black-box transferability across models. Experiments demonstrate that under a universal perturbation setting, our method significantly outperforms existing baselines; even under tight perturbation budgets, it matches the performance of image-specific techniques and shows robust generalization across multiple state-of-the-art diffusion models.
📝 Abstract
Recent advances in diffusion models have enabled powerful image editing capabilities guided by natural language prompts, unlocking new creative possibilities. However, they introduce significant ethical and legal risks, such as deepfakes and unauthorized use of copyrighted visual content. To address these risks, image immunization has emerged as a promising defense against AI-driven semantic manipulation. Yet, most existing approaches rely on image-specific adversarial perturbations that require individual optimization for each image, thereby limiting scalability and practicality. In this paper, we propose the first universal image immunization framework that generates a single, broadly applicable adversarial perturbation specifically designed for diffusion-based editing pipelines. Inspired by universal adversarial perturbation (UAP) techniques used in targeted attacks, our method generates a UAP that embeds a semantic target into images to be protected. Simultaneously, it suppresses original content to effectively misdirect the model's attention during editing. As a result, our approach effectively blocks malicious editing attempts by overwriting the original semantic content in the image via the UAP. Moreover, our method operates effectively even in data-free settings without requiring access to training data or domain knowledge, further enhancing its practicality and broad applicability in real-world scenarios. Extensive experiments show that our method, as the first universal immunization approach, significantly outperforms several baselines in the UAP setting. In addition, despite the inherent difficulty of universal perturbations, our method also achieves performance on par with image-specific methods under a more restricted perturbation budget, while also exhibiting strong black-box transferability across different diffusion models.