🤖 AI Summary
To address the limited interpretability and coarse localization accuracy in deepfake detection, this paper proposes a diffusion-based artifact localization framework. The method leverages denoising diffusion probabilistic models to generate high-fidelity DSSIM structural discrepancy maps, precisely capturing subtle manipulation traces. It further fuses high-level semantic features extracted from a pre-trained face detector to enable multimodal feature co-modeling. Compared to existing approaches, our framework maintains high detection accuracy while significantly improving fine-grained localization of forged regions. Extensive experiments demonstrate state-of-the-art performance under both cross-dataset and in-dataset evaluation protocols. Moreover, the framework produces intuitive, visually grounded explanations—enhancing model interpretability and fostering user trust without compromising detection reliability.
📝 Abstract
The rapid evolution of deepfake generation techniques demands robust and accurate face forgery detection algorithms. While determining whether an image has been manipulated remains essential, the ability to precisely localize forgery artifacts has become increasingly important for improving model explainability and fostering user trust. To address this challenge, we propose DiffusionFF, a novel framework that enhances face forgery detection through diffusion-based artifact localization. Our method utilizes a denoising diffusion model to generate high-quality Structural Dissimilarity (DSSIM) maps, which effectively capture subtle traces of manipulation. These DSSIM maps are then fused with high-level semantic features extracted by a pretrained forgery detector, leading to significant improvements in detection accuracy. Extensive experiments on both cross-dataset and intra-dataset benchmarks demonstrate that DiffusionFF not only achieves superior detection performance but also offers precise and fine-grained artifact localization, highlighting its overall effectiveness.