🤖 AI Summary
This work addresses the growing challenges of misinformation and model attribution posed by increasingly photorealistic AI-generated images. To systematically evaluate the detectability and traceability of such content, the authors propose and organize the Counter Turing Test competition and introduce the MS COCOAI dataset. They employ a comprehensive suite of methods—including CNNs, Vision Transformers, frequency-domain analysis, contrastive learning, and multimodal approaches—for both image authenticity verification and source model identification. Experimental results demonstrate that while binary detection of real versus synthetic images achieves an F1-score exceeding 0.83, the accuracy of identifying the specific generative model peaks at only 0.4986. This stark performance gap reveals, for the first time, that fine-grained model attribution is substantially more difficult than general forgery detection, highlighting a critical bottleneck in current forensic capabilities.
📝 Abstract
The rapid advancements in generative AI technologies, such as Stable Diffusion, DALL-E, and Midjourney, have significantly transformed the creation of synthetic visual content. While these models enable innovation across industries, they also pose serious challenges, including misinformation, disinformation, and biased content generation. The increasing realism of AI-generated images makes their detection a pressing concern for researchers, policymakers, and industry stakeholders.
In this paper, we present the findings of the Defactify 4.0 workshop, which introduced the Counter Turing Test (CT2) for AI-Generated Image Detection. The competition consisted of two key tasks: (1) binary classification of images as either AI-generated or real and (2) identification of the specific generative model responsible for an AI-generated image. To facilitate this, we developed the MS COCOAI dataset, consisting of 50,000 synthetic images from multiple generative models alongside real-world images from the MS COCO dataset.
Participants employed diverse detection strategies, including convolutional neural networks (CNNs), Vision Transformers (ViTs), frequency-based analysis, contrastive learning, and multimodal techniques. The results demonstrated that while AI-generated images can be detected with high accuracy (F1-score > 0.83), identifying the exact model used remains significantly more challenging (highest F1-score: 0.4986). These findings highlight the need for improved model fingerprinting, adversarial robustness, and real-time detection mechanisms.