🤖 AI Summary
Existing AI-generated image detectors rely heavily on priors specific to particular generative models, resulting in poor generalization across diverse generators and real-world scenarios. Method: This paper proposes a self-supervised detection framework that operates without access to internal generator information. Its core innovation is the first use of camera EXIF metadata to design classification and ranking pretraining tasks, enabling learning of intrinsic imaging characteristics of authentic images. Subsequently, a binary classifier is constructed by integrating a Gaussian Mixture Model with high-frequency residual blocks, augmented by spatially shuffled patch perturbations to enhance robustness. Contribution/Results: The method supports both one-class anomaly detection and binary classification in a unified, plug-and-play manner. It significantly outperforms state-of-the-art approaches on images from multiple generative models and real-world datasets, demonstrating strong cross-model generalization and resilience to common corruptions such as compression and noise.
📝 Abstract
The proliferation of AI-generated imagery poses escalating challenges for multimedia forensics, yet many existing detectors depend on assumptions about the internals of specific generative models, limiting their cross-model applicability. We introduce a self-supervised approach for detecting AI-generated images that leverages camera metadata -- specifically exchangeable image file format (EXIF) tags -- to learn features intrinsic to digital photography. Our pretext task trains a feature extractor solely on camera-captured photographs by classifying categorical EXIF tags (eg, camera model and scene type) and pairwise-ranking ordinal and continuous EXIF tags (eg, focal length and aperture value). Using these EXIF-induced features, we first perform one-class detection by modeling the distribution of photographic images with a Gaussian mixture model and flagging low-likelihood samples as AI-generated. We then extend to binary detection that treats the learned extractor as a strong regularizer for a classifier of the same architecture, operating on high-frequency residuals from spatially scrambled patches. Extensive experiments across various generative models demonstrate that our EXIF-induced detectors substantially advance the state of the art, delivering strong generalization to in-the-wild samples and robustness to common benign image perturbations.