Purify Once, Edit Freely: Breaking Image Protections under Model Mismatch

📅 2026-03-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing image protection methods exhibit insufficient robustness against heterogeneous diffusion model attacks and lack systematic evaluation under model mismatch scenarios. This work proposes a unified post-publication purification framework that restores image editability without requiring the original image or internal knowledge of the defense mechanism, even when the attacker’s and defender’s model architectures differ. We are the first to systematically uncover failure modes of the “purify once, edit freely” paradigm and introduce two practical purifiers: a VAE-Transformer-based latent space projection correction module and EditorClean, a Diffusion Transformer-driven instruction-guided reconstruction model. Evaluated across 2,100 editing tasks, EditorClean improves PSNR by 3–6 dB and reduces FID by 50–70% over existing baselines, demonstrating that most protection mechanisms fail after purification.

Technology Category

Application Category

📝 Abstract
Diffusion models enable high-fidelity image editing but can also be misused for unauthorized style imitation and harmful content generation. To mitigate these risks, proactive image protection methods embed small, often imperceptible adversarial perturbations into images before sharing to disrupt downstream editing or fine-tuning. However, in realistic post-release scenarios, content owners cannot control downstream processing pipelines, and protections optimized for a surrogate model may fail when attackers use mismatched diffusion pipelines. Existing purification methods can weaken protections but often sacrifice image quality and rarely examine architectural mismatch. We introduce a unified post-release purification framework to evaluate protection survivability under model mismatch. We propose two practical purifiers: VAE-Trans, which corrects protected images via latent-space projection, and EditorClean, which performs instruction-guided reconstruction with a Diffusion Transformer to exploit architectural heterogeneity. Both operate without access to protected images or defense internals. Across 2,100 editing tasks and six representative protection methods, EditorClean consistently restores editability. Compared to protected inputs, it improves PSNR by 3-6 dB and reduces FID by 50-70 percent on downstream edits, while outperforming prior purification baselines by about 2 dB PSNR and 30 percent lower FID. Our results reveal a purify-once, edit-freely failure mode: once purification succeeds, the protective signal is largely removed, enabling unrestricted editing. This highlights the need to evaluate protections under model mismatch and design defenses robust to heterogeneous attackers.
Problem

Research questions and friction points this paper is trying to address.

image protection
model mismatch
diffusion models
adversarial perturbations
post-release purification
Innovation

Methods, ideas, or system contributions that make the work stand out.

model mismatch
image purification
diffusion transformer
adversarial protection
editability restoration
🔎 Similar Papers
No similar papers found.
Q
Qichen Zhao
Peking University
Shengfang Zhai
Shengfang Zhai
Peking University
Trustworthy AIGenerative ModelsAI PrivacyBackdoor Attacks
X
Xinjian Bai
Peking University
Q
Qingni Shen
Peking University
Q
Qiqi Lin
Peking University
Y
Yansong Gao
The University of Western Australia
Z
Zhonghai Wu
Peking University