When Semantics Regulate: Rethinking Patch Shuffle and Internal Bias for Generated Image Detection with CLIP

📅 2025-11-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing CLIP-based detectors over-rely on semantic cues while neglecting generative artifacts, leading to poor generalization across models and distributions. To address this, we propose SemAnti, a semantic-antagonistic paradigm. First, we introduce Patch Shuffle—a structural perturbation that disrupts local semantic coherence while preserving low-level artifact patterns. Second, we identify CLIP’s deep semantic subspace as an effective feature modulator; we freeze this subspace and apply lightweight fine-tuning exclusively to semantically sensitive layers. This is further enhanced by hierarchical feature analysis and semantic entropy regularization to jointly suppress semantic bias and amplify artifact signals. Evaluated on AIGCDetectBenchmark and GenImage, SemAnti achieves state-of-the-art detection performance and significantly improves cross-domain robustness—marking the first framework to co-optimize semantic suppression and artifact enhancement in CLIP-based detection.

Technology Category

Application Category

📝 Abstract
The rapid progress of GANs and Diffusion Models poses new challenges for detecting AI-generated images. Although CLIP-based detectors exhibit promising generalization, they often rely on semantic cues rather than generator artifacts, leading to brittle performance under distribution shifts. In this work, we revisit the nature of semantic bias and uncover that Patch Shuffle provides an unusually strong benefit for CLIP, that disrupts global semantic continuity while preserving local artifact cues, which reduces semantic entropy and homogenizes feature distributions between natural and synthetic images. Through a detailed layer-wise analysis, we further show that CLIP's deep semantic structure functions as a regulator that stabilizes cross-domain representations once semantic bias is suppressed. Guided by these findings, we propose SemAnti, a semantic-antagonistic fine-tuning paradigm that freezes the semantic subspace and adapts only artifact-sensitive layers under shuffled semantics. Despite its simplicity, SemAnti achieves state-of-the-art cross-domain generalization on AIGCDetectBenchmark and GenImage, demonstrating that regulating semantics is key to unlocking CLIP's full potential for robust AI-generated image detection.
Problem

Research questions and friction points this paper is trying to address.

Detecting AI-generated images from GANs and Diffusion Models
Addressing semantic bias in CLIP-based detection methods
Improving cross-domain generalization for synthetic image detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Patch Shuffle disrupts semantics while preserving artifact cues
SemAnti fine-tunes artifact-sensitive layers under shuffled semantics
Freezing semantic subspace stabilizes cross-domain feature representations
🔎 Similar Papers
Beilin Chu
Beilin Chu
Beijing University of Posts and Telecommunications
AIMulti-model learningAIGC detection
W
Weike You
School of CyberSpace Security, Beijing University of Posts and Telecommunications
M
Mengtao Li
School of CyberSpace Security, Beijing University of Posts and Telecommunications
Tingting Zheng
Tingting Zheng
Harbin Institute of Technology
computer visiondeep learingmedical image analysis
K
Kehan Zhao
School of CyberSpace Security, Beijing University of Posts and Telecommunications
X
Xuan Xu
School of CyberSpace Security, Beijing University of Posts and Telecommunications
Z
Zhigao Lu
School of CyberSpace Security, Beijing University of Posts and Telecommunications
Jia Song
Jia Song
Assistant Professor, University of Idaho
Cybersecurity
M
Moxuan Xu
School of Finance, Central University of Finance and Economics
L
Linna Zhou
School of CyberSpace Security, Beijing University of Posts and Telecommunications