🤖 AI Summary
To address the growing security challenge posed by increasingly photorealistic and hard-to-distinguish diffusion-generated images, this paper proposes a generic detection method grounded in frequency-guided reconstruction error. The core insight is the first identification and exploitation of an inherent weakness in diffusion models: their suboptimal reconstruction capability in the mid-frequency band. Our method decomposes input images into frequency components via discrete cosine transform (DCT) or discrete wavelet transform (DWT), reconstructs these components using a lightweight autoencoder, and quantifies the discrepancy between pre- and post-decomposition reconstruction errors as the discriminative signal. Crucially, it requires no prior knowledge of the generative model, ensuring cross-model generalizability and robustness against common image perturbations (e.g., compression, resizing, noise). Extensive experiments demonstrate that our approach achieves significantly higher detection accuracy than state-of-the-art methods across diverse unknown diffusion models and under various corruptions.
📝 Abstract
The rapid advancement of diffusion models has significantly improved high-quality image generation, making generated content increasingly challenging to distinguish from real images and raising concerns about potential misuse. In this paper, we observe that diffusion models struggle to accurately reconstruct mid-band frequency information in real images, suggesting the limitation could serve as a cue for detecting diffusion model generated images. Motivated by this observation, we propose a novel method called Frequency-guided Reconstruction Error (FIRE), which, to the best of our knowledge, is the first to investigate the influence of frequency decomposition on reconstruction error. FIRE assesses the variation in reconstruction error before and after the frequency decomposition, offering a robust method for identifying diffusion model generated images. Extensive experiments show that FIRE generalizes effectively to unseen diffusion models and maintains robustness against diverse perturbations.