CO-SPY: Combining Semantic and Pixel Features to Detect Synthetic Images by AI

📅 2025-03-24

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

The proliferation of generative AI images poses escalating challenges for forgery detection, as existing methods suffer from poor generalization and vulnerability to post-processing distortions. Method: We propose Co-Spy, the first framework that jointly models semantic anomalies (e.g., anatomical inconsistencies) and pixel-level artifacts via dual-path reasoning. It incorporates multi-scale feature enhancement, attention-guided adaptive fusion, contrastive pre-training, and a lightweight classification head for robust detection. Contribution/Results: We introduce Co-Spy-Bench—the first comprehensive benchmark covering five real-world data categories, 22 state-of-the-art generative models, and 50,000 in-the-wild web images. Under unified training, Co-Spy achieves 11–34% higher average accuracy than prior art, significantly improving cross-model generalization and robustness against JPEG compression and other common post-processing operations. Our code is publicly available.

Technology Category

Application Category

📝 Abstract

With the rapid advancement of generative AI, it is now possible to synthesize high-quality images in a few seconds. Despite the power of these technologies, they raise significant concerns regarding misuse. Current efforts to distinguish between real and AI-generated images may lack generalization, being effective for only certain types of generative models and susceptible to post-processing techniques like JPEG compression. To overcome these limitations, we propose a novel framework, Co-Spy, that first enhances existing semantic features (e.g., the number of fingers in a hand) and artifact features (e.g., pixel value differences), and then adaptively integrates them to achieve more general and robust synthetic image detection. Additionally, we create Co-Spy-Bench, a comprehensive dataset comprising 5 real image datasets and 22 state-of-the-art generative models, including the latest models like FLUX. We also collect 50k synthetic images in the wild from the Internet to enable evaluation in a more practical setting. Our extensive evaluations demonstrate that our detector outperforms existing methods under identical training conditions, achieving an average accuracy improvement of approximately 11% to 34%. The code is available at https://github.com/Megum1/Co-Spy.

Problem

Research questions and friction points this paper is trying to address.

Detecting AI-generated images using combined semantic and pixel features

Overcoming limitations of current methods in generalization and robustness

Creating a comprehensive benchmark for synthetic image detection evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Enhances semantic and artifact features

Adaptively integrates features for detection

Creates comprehensive dataset for evaluation

🔎 Similar Papers

TextureCrop: Enhancing Synthetic Image Detection through Texture-based Cropping

2024-07-22arXiv.orgCitations: 1

Improving Synthetic Image Detection Towards Generalization: An Image Transformation Perspective

2024-08-13arXiv.orgCitations: 0