🤖 AI Summary
To address poor generalization in synthetic image detection caused by interference from image resolution and stylistic variations, this paper proposes a robust diffusion-based detection framework. Methodologically, it introduces: (1) a multi-scale style backtracking module that explicitly models and mitigates style bias between training data and generative models; (2) a Correntropy-based Sparse Additive Machine (CSAM) for feature refinement and disentanglement, enhancing cross-resolution and cross-style generalization; and (3) a structure-aware learning mechanism to jointly strengthen local discriminability and global consistency. Extensive experiments demonstrate substantial improvements over state-of-the-art methods across multiple benchmark datasets. Crucially, the framework exhibits superior robustness and universality under challenging zero-shot settings—i.e., when evaluated on unseen generators, arbitrary resolutions, and diverse artistic styles—without requiring retraining or fine-tuning.
📝 Abstract
Generative artificial intelligence holds significant potential for abuse, and generative image detection has become a key focus of research. However, existing methods primarily focused on detecting a specific generative model and emphasizing the localization of synthetic regions, while neglecting the interference caused by image size and style on model learning. Our goal is to reach a fundamental conclusion: Is the image real or generated? To this end, we propose a diffusion model-based generative image detection framework termed Hierarchical Retrospection Refinement~(HRR). It designs a multi-scale style retrospection module that encourages the model to generate detailed and realistic multi-scale representations, while alleviating the learning biases introduced by dataset styles and generative models. Additionally, based on the principle of correntropy sparse additive machine, a feature refinement module is designed to reduce the impact of redundant features on learning and capture the intrinsic structure and patterns of the data, thereby improving the model's generalization ability. Extensive experiments demonstrate the HRR framework consistently delivers significant performance improvements, outperforming state-of-the-art methods in generated image detection task.