🤖 AI Summary
To address the limited zero-shot generalization capability in AI-generated image detection—where conventional methods rely on training data from known generators and thus fail to identify images produced by unseen models—this paper proposes the Forensic Self-Descriptor (FSD) framework. FSD employs self-supervised learning exclusively on authentic images, modeling multi-scale microstructures to extract generator-agnostic forensic residuals intrinsic to synthetic image formation, yielding compact, image-level representations. Crucially, it requires no synthetic images or generator priors. Consequently, FSD enables zero-shot detection, open-set source attribution, and unsupervised source clustering. Extensive evaluations across multiple benchmarks demonstrate substantial improvements over state-of-the-art methods, with strong cross-architecture generalization and robustness. To our knowledge, FSD is the first approach to achieve truly prior-free, universal AI image provenance tracing.
📝 Abstract
The emergence of advanced AI-based tools to generate realistic images poses significant challenges for forensic detection and source attribution, especially as new generative techniques appear rapidly. Traditional methods often fail to generalize to unseen generators due to reliance on features specific to known sources during training. To address this problem, we propose a novel approach that explicitly models forensic microstructures - subtle, pixel-level patterns unique to the image creation process. Using only real images in a self-supervised manner, we learn a set of diverse predictive filters to extract residuals that capture different aspects of these microstructures. By jointly modeling these residuals across multiple scales, we obtain a compact model whose parameters constitute a unique forensic self-description for each image. This self-description enables us to perform zero-shot detection of synthetic images, open-set source attribution of images, and clustering based on source without prior knowledge. Extensive experiments demonstrate that our method achieves superior accuracy and adaptability compared to competing techniques, advancing the state of the art in synthetic media forensics.