Frequency-Aware Semantic Fusion with Gated Injection for AI-generated Image Detection

📅 2026-04-30

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

Existing methods for detecting AI-generated images exhibit limited generalization when confronted with unseen generative models, primarily due to shortcut biases in the frequency domain and representational conflicts between semantic and frequency features. To address these challenges, this work proposes a Frequency-aware Gated Injection Network (FGINet), which incorporates Band Masked Frequency Encoding (BMFE) to reduce reliance on generator-specific patterns. FGINet further employs a Layer-wise Gated Frequency Injection (LGFI) mechanism to adaptively integrate frequency cues into a vision foundation model. Additionally, it introduces Hyperspherical Compact Learning (HCL) with cosine margin-based regularization to enhance feature discriminability. The proposed approach effectively mitigates frequency shortcuts and cross-domain representation conflicts, achieving state-of-the-art performance in cross-model generalization across multiple challenging benchmarks.

📝 Abstract

AI-generated images are becoming increasingly realistic and diverse, posing significant challenges for generalizable detection. While Vision Foundation Models (VFMs) provide rich semantic representations and frequency-based methods capture complementary artifact cues, existing approaches that combine these modalities still suffer from limited generalization, with notable performance degradation on unseen generative models. We attribute this limitation to two key factors: frequency shortcut bias toward easily distinguishable cues associated with specific generators and cross-domain representation conflict between high-level semantics and low-level frequency patterns. To address these issues, we propose a Frequency-aware Gated Injection Network (FGINet) to improve generalization. Specifically, we design a Band-Masked Frequency Encoder (BMFE) that applies cross-band masking in the frequency domain to reduce reliance on generator-specific patterns and encourage more diverse and generalizable representations. We further introduce a Layer-wise Gated Frequency Injection (LGFI) mechanism to progressively inject frequency cues into the VFM backbone with adaptive gating, aligning with its hierarchical abstraction and alleviating representation conflict. Moreover, we propose a Hyperspherical Compactness Learning (HCL) framework with a cosine margin objective to learn compact and well-separated representations. Extensive experiments demonstrate that FGINet achieves state-of-the-art performance and strong generalization across multiple challenging datasets.

Problem

Research questions and friction points this paper is trying to address.

AI-generated image detection

generalization

frequency artifacts

semantic fusion

cross-domain representation conflict

Innovation

Methods, ideas, or system contributions that make the work stand out.

Frequency-aware fusion

Gated Injection

Band-Masked Frequency Encoder