When Semantics Regulate: Rethinking Patch Shuffle and Internal Bias for Generated Image Detection with CLIP

📅 2025-11-24

📈 Citations: 0

✨ Influential: 0

career value

162K/year

🤖 AI Summary

Existing CLIP-based detectors over-rely on semantic cues while neglecting generative artifacts, leading to poor generalization across models and distributions. To address this, we propose SemAnti, a semantic-antagonistic paradigm. First, we introduce Patch Shuffle—a structural perturbation that disrupts local semantic coherence while preserving low-level artifact patterns. Second, we identify CLIP’s deep semantic subspace as an effective feature modulator; we freeze this subspace and apply lightweight fine-tuning exclusively to semantically sensitive layers. This is further enhanced by hierarchical feature analysis and semantic entropy regularization to jointly suppress semantic bias and amplify artifact signals. Evaluated on AIGCDetectBenchmark and GenImage, SemAnti achieves state-of-the-art detection performance and significantly improves cross-domain robustness—marking the first framework to co-optimize semantic suppression and artifact enhancement in CLIP-based detection.

Technology Category

Application Category

📝 Abstract

The rapid progress of GANs and Diffusion Models poses new challenges for detecting AI-generated images. Although CLIP-based detectors exhibit promising generalization, they often rely on semantic cues rather than generator artifacts, leading to brittle performance under distribution shifts. In this work, we revisit the nature of semantic bias and uncover that Patch Shuffle provides an unusually strong benefit for CLIP, that disrupts global semantic continuity while preserving local artifact cues, which reduces semantic entropy and homogenizes feature distributions between natural and synthetic images. Through a detailed layer-wise analysis, we further show that CLIP's deep semantic structure functions as a regulator that stabilizes cross-domain representations once semantic bias is suppressed. Guided by these findings, we propose SemAnti, a semantic-antagonistic fine-tuning paradigm that freezes the semantic subspace and adapts only artifact-sensitive layers under shuffled semantics. Despite its simplicity, SemAnti achieves state-of-the-art cross-domain generalization on AIGCDetectBenchmark and GenImage, demonstrating that regulating semantics is key to unlocking CLIP's full potential for robust AI-generated image detection.

Problem

Research questions and friction points this paper is trying to address.

Detecting AI-generated images from GANs and Diffusion Models

Addressing semantic bias in CLIP-based detection methods

Improving cross-domain generalization for synthetic image detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Patch Shuffle disrupts semantics while preserving artifact cues

SemAnti fine-tunes artifact-sensitive layers under shuffled semantics

Freezing semantic subspace stabilizes cross-domain feature representations

🔎 Similar Papers

A Sanity Check for AI-generated Image Detection