Robust AI-Synthesized Image Detection via Multi-feature Frequency-aware Learning

📅 2025-04-02

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

To address the insufficient robustness of generative AI image detection under out-of-distribution models and transmission distortions (e.g., compression, blurring), this paper proposes a multi-feature fusion framework for robust detection. Methodologically, it introduces a novel frequency-adaptive dilated convolution architecture that jointly models spatial forensic features—such as noise correlation and gradient statistics—and frequency-domain anomalies. Additionally, a cross-source attention mechanism is incorporated to synergistically enhance noise features, gradient representations, and knowledge from pre-trained vision encoders. Evaluated across 14 heterogeneous generative models, the framework achieves state-of-the-art cross-model detection performance, with significantly higher average accuracy than existing methods. It also demonstrates strong robustness against diverse real-world distortions, confirming its generalizability and practical utility in realistic deployment scenarios.

Technology Category

Application Category

📝 Abstract

The rapid progression of generative AI (GenAI) technologies has heightened concerns regarding the misuse of AI-generated imagery. To address this issue, robust detection methods have emerged as particularly compelling, especially in challenging conditions where the targeted GenAI models are out-of-distribution or the generated images have been subjected to perturbations during transmission. This paper introduces a multi-feature fusion framework designed to enhance spatial forensic feature representations with incorporating three complementary components, namely noise correlation analysis, image gradient information, and pretrained vision encoder knowledge, using a cross-source attention mechanism. Furthermore, to identify spectral abnormality in synthetic images, we propose a frequency-aware architecture that employs the Frequency-Adaptive Dilated Convolution, enabling the joint modeling of spatial and spectral features while maintaining low computational complexity. Our framework exhibits exceptional generalization performance across fourteen diverse GenAI systems, including text-to-image diffusion models, autoregressive approaches, and post-processed deepfake methods. Notably, it achieves significantly higher mean accuracy in cross-model detection tasks when compared to existing state-of-the-art techniques. Additionally, the proposed method demonstrates resilience against various types of real-world image noise perturbations such as compression and blurring. Extensive ablation studies further corroborate the synergistic benefits of fusing multi-model forensic features with frequency-aware learning, underscoring the efficacy of our approach.

Problem

Research questions and friction points this paper is trying to address.

Detect AI-generated images robustly under challenging conditions

Fuse multi-feature forensic representations for improved detection accuracy

Identify spectral abnormalities in synthetic images efficiently

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-feature fusion with cross-source attention

Frequency-aware architecture with adaptive convolution

Robust detection across diverse GenAI systems

🔎 Similar Papers

TextureCrop: Enhancing Synthetic Image Detection through Texture-based Cropping