SDiFL: Stable Diffusion-Driven Framework for Image Forgery Localization

📅 2025-08-27

📈 Citations: 0

✨ Influential: 0

career value

242K/year

🤖 AI Summary

Current image forgery localization methods heavily rely on large-scale manual annotations and struggle to generalize to modern generative models like Stable Diffusion. To address this, we propose the first unsupervised localization framework that jointly leverages generative and perceptual capabilities. Our method uniquely exploits the multimodal latent space of Stable Diffusion V3, explicitly injecting high-pass-filtered forgery residuals as an auxiliary modality into the latent representation—enabling cross-modal collaborative modeling between residuals and images while preserving semantic integrity. Crucially, it requires no additional annotations, relying solely on the intrinsic deep perceptual capacity of diffusion models for precise forgery localization. Evaluated on mainstream benchmarks, our approach achieves a 12% improvement in detection performance and demonstrates significantly enhanced generalization to unseen real-world document and natural-scene images, consistently outperforming existing state-of-the-art methods.

Technology Category

Application Category

📝 Abstract

Driven by the new generation of multi-modal large models, such as Stable Diffusion (SD), image manipulation technologies have advanced rapidly, posing significant challenges to image forensics. However, existing image forgery localization methods, which heavily rely on labor-intensive and costly annotated data, are struggling to keep pace with these emerging image manipulation technologies. To address these challenges, we are the first to integrate both image generation and powerful perceptual capabilities of SD into an image forensic framework, enabling more efficient and accurate forgery localization. First, we theoretically show that the multi-modal architecture of SD can be conditioned on forgery-related information, enabling the model to inherently output forgery localization results. Then, building on this foundation, we specifically leverage the multimodal framework of Stable DiffusionV3 (SD3) to enhance forgery localization performance.We leverage the multi-modal processing capabilities of SD3 in the latent space by treating image forgery residuals -- high-frequency signals extracted using specific highpass filters -- as an explicit modality. This modality is fused into the latent space during training to enhance forgery localization performance. Notably, our method fully preserves the latent features extracted by SD3, thereby retaining the rich semantic information of the input image. Experimental results show that our framework achieves up to 12% improvements in performance on widely used benchmarking datasets compared to current state-of-the-art image forgery localization models. Encouragingly, the model demonstrates strong performance on forensic tasks involving real-world document forgery images and natural scene forging images, even when such data were entirely unseen during training.

Problem

Research questions and friction points this paper is trying to address.

Localizing image forgeries without costly annotated data

Integrating Stable Diffusion for enhanced forgery detection

Handling real-world document and natural scene forgeries

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates Stable Diffusion for forgery localization

Fuses forgery residuals as explicit modality

Preserves latent semantic features from SD3

🔎 Similar Papers

FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models