🤖 AI Summary
This work addresses the limited generalization of existing AI-generated image detection methods, which often rely on model-specific artifacts. To overcome this, the authors propose the Multi-Clue Aggregation Network (MCAN), a novel unified framework that adaptively integrates complementary forensic cues—namely spatial content, high-frequency edges, and chrominance inconsistencies—through a hybrid encoder-adapter architecture. This design enables adaptive feature fusion and discriminative representation learning, significantly enhancing the modeling of intrinsic characteristics inherent to real images. Extensive experiments demonstrate that MCAN achieves state-of-the-art performance across three major benchmarks: GenImage, Chameleon, and UniversalFakeDetect. Notably, on GenImage, it improves the average accuracy by 7.4% over the current best method, underscoring its robustness and effectiveness in detecting diverse generative models.
📝 Abstract
The rapid emergence of image synthesis models poses challenges to the generalization of AI-generated image detectors. However, existing methods often rely on model-specific features, leading to overfitting and poor generalization. In this paper, we introduce the Multi-Cue Aggregation Network (MCAN), a novel framework that integrates different yet complementary cues in a unified network. MCAN employs a mixture-of-encoders adapter to dynamically process these cues, enabling more adaptive and robust feature representation. Our cues include the input image itself, which represents the overall content, and high-frequency components that emphasize edge details. Additionally, we introduce a Chromatic Inconsistency (CI) cue, which normalizes intensity values and captures noise information introduced during the image acquisition process in real images, making these noise patterns more distinguishable from those in AI-generated content. Unlike prior methods, MCAN's novelty lies in its unified multi-cue aggregation framework, which integrates spatial, frequency-domain, and chromaticity-based information for enhanced representation learning. These cues are intrinsically more indicative of real images, enhancing cross-model generalization. Extensive experiments on the GenImage, Chameleon, and UniversalFakeDetect benchmark validate the state-of-the-art performance of MCAN. In the GenImage dataset, MCAN outperforms the best state-of-the-art method by up to 7.4% in average ACC across eight different image generators.