CINEMAE: Leveraging Frozen Masked Autoencoders for Cross-Generator AI Image Detection

📅 2025-11-09

📈 Citations: 0

✨ Influential: 0

career value

155K/year

🤖 AI Summary

Current AI-generated image detectors suffer from overfitting to generator-specific artifacts, resulting in poor cross-generator generalization. This paper proposes CINEMAE, the first method to adapt contextual inconsistency modeling—previously used in text detection—to the visual domain. It leverages a frozen Masked Autoencoder (MAE) for conditional reconstruction and employs reconstruction uncertainty—formalized as negative log-likelihood—as a transferable, local semantic anomaly signal. Additionally, we introduce a patch-level anomaly scoring scheme and a learnable global–local feature fusion mechanism. CINEMAE requires no fine-tuning and imposes no assumptions about generator architecture or prior knowledge. On the GenImage benchmark, it achieves a mean detection accuracy of 95.2% across eight unseen generative models, significantly outperforming state-of-the-art methods. Its core innovation lies in formulating MAE reconstruction uncertainty as a universal, generalizable detection cue, enabling robust cross-generator detection.

Technology Category

Application Category

📝 Abstract

While context-based detectors have achieved strong generalization for AI-generated text by measuring distributional inconsistencies, image-based detectors still struggle with overfitting to generator-specific artifacts. We introduce CINEMAE, a novel paradigm for AIGC image detection that adapts the core principles of text detection methods to the visual domain. Our key insight is that Masked AutoEncoder (MAE), trained to reconstruct masked patches conditioned on visible context, naturally encodes semantic consistency expectations. We formalize this reconstruction process probabilistically, computing conditional Negative Log-Likelihood (NLL, p(masked | visible)) to quantify local semantic anomalies. By aggregating these patch-level statistics with global MAE features through learned fusion, CINEMAE achieves strong cross-generator generalization. Trained exclusively on Stable Diffusion v1.4, our method achieves over 95% accuracy on all eight unseen generators in the GenImage benchmark, substantially outperforming state-of-the-art detectors. This demonstrates that context-conditional reconstruction uncertainty provides a robust, transferable signal for AIGC detection.

Problem

Research questions and friction points this paper is trying to address.

Detecting AI-generated images across different generators effectively

Overcoming overfitting to specific generator artifacts in detection

Quantifying semantic inconsistencies using reconstruction uncertainty metrics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses frozen Masked Autoencoders for semantic consistency

Quantifies local anomalies via conditional reconstruction likelihood

Fuses patch statistics with global features for generalization

🔎 Similar Papers

No similar papers found.