SARE: Semantic-Aware Reconstruction Error for Generalizable Diffusion-Generated Image Detection

📅 2025-08-13

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

To address the poor out-of-distribution (OOD) generalization of diffusion-based image detection methods across unseen generative models, this paper proposes a robust detection framework based on Semantic-Aware Reconstruction Error (SARE). Our method leverages a pre-trained diffusion model for text-guided image reconstruction and employs a vision-language model to quantify the semantic distance between the original and reconstructed images; the resulting semantic reconstruction deviation serves as the primary authenticity indicator. Unlike conventional approaches relying on model-specific artifacts, we are the first to identify and exploit an intrinsic property: authentic images—due to their higher semantic complexity—exhibit larger semantic reconstruction deviations than synthetic ones. This enables model-agnostic, cross-model generalized detection. Extensive experiments on GenImage and CommunityForensics benchmarks demonstrate significant improvements over state-of-the-art methods, especially under OOD settings with unseen generators, achieving consistently high detection accuracy and confirming strong generalizability and practical utility.

Technology Category

Application Category

📝 Abstract

Recently, diffusion-generated image detection has gained increasing attention, as the rapid advancement of diffusion models has raised serious concerns about their potential misuse. While existing detection methods have achieved promising results, their performance often degrades significantly when facing fake images from unseen, out-of-distribution (OOD) generative models, since they primarily rely on model-specific artifacts. To address this limitation, we explore a fundamental property commonly observed in fake images. Motivated by the observation that fake images tend to exhibit higher similarity to their captions than real images, we propose a novel representation, namely Semantic-Aware Reconstruction Error (SARE), that measures the semantic difference between an image and its caption-guided reconstruction. The hypothesis behind SARE is that real images, whose captions often fail to fully capture their complex visual content, may undergo noticeable semantic shifts during the caption-guided reconstruction process. In contrast, fake images, which closely align with their captions, show minimal semantic changes. By quantifying these semantic shifts, SARE can be utilized as a discriminative feature for robust detection across diverse generative models. We empirically demonstrate that the proposed method exhibits strong generalization, outperforming existing baselines on benchmarks including GenImage and CommunityForensics.

Problem

Research questions and friction points this paper is trying to address.

Detect diffusion-generated images robustly across unseen models

Measure semantic difference between images and caption-guided reconstructions

Improve generalization in fake image detection using SARE

Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantic-Aware Reconstruction Error (SARE)

Measures semantic difference via caption-guided reconstruction

Robust detection across diverse generative models

🔎 Similar Papers

LaRE2: Latent Reconstruction Error Based Method for Diffusion-Generated Image Detection