🤖 AI Summary
Existing CLIP-based detectors over-rely on semantic cues while neglecting generative artifacts, leading to poor generalization across models and distributions. To address this, we propose SemAnti, a semantic-antagonistic paradigm. First, we introduce Patch Shuffle—a structural perturbation that disrupts local semantic coherence while preserving low-level artifact patterns. Second, we identify CLIP’s deep semantic subspace as an effective feature modulator; we freeze this subspace and apply lightweight fine-tuning exclusively to semantically sensitive layers. This is further enhanced by hierarchical feature analysis and semantic entropy regularization to jointly suppress semantic bias and amplify artifact signals. Evaluated on AIGCDetectBenchmark and GenImage, SemAnti achieves state-of-the-art detection performance and significantly improves cross-domain robustness—marking the first framework to co-optimize semantic suppression and artifact enhancement in CLIP-based detection.
📝 Abstract
The rapid progress of GANs and Diffusion Models poses new challenges for detecting AI-generated images. Although CLIP-based detectors exhibit promising generalization, they often rely on semantic cues rather than generator artifacts, leading to brittle performance under distribution shifts. In this work, we revisit the nature of semantic bias and uncover that Patch Shuffle provides an unusually strong benefit for CLIP, that disrupts global semantic continuity while preserving local artifact cues, which reduces semantic entropy and homogenizes feature distributions between natural and synthetic images. Through a detailed layer-wise analysis, we further show that CLIP's deep semantic structure functions as a regulator that stabilizes cross-domain representations once semantic bias is suppressed. Guided by these findings, we propose SemAnti, a semantic-antagonistic fine-tuning paradigm that freezes the semantic subspace and adapts only artifact-sensitive layers under shuffled semantics. Despite its simplicity, SemAnti achieves state-of-the-art cross-domain generalization on AIGCDetectBenchmark and GenImage, demonstrating that regulating semantics is key to unlocking CLIP's full potential for robust AI-generated image detection.