🤖 AI Summary
Existing AI-generated image detection methods predominantly rely on low-resolution inputs and suffer substantial performance degradation on high-resolution, photorealistic forgeries; conventional preprocessing—such as downscaling or cropping—often discards high-frequency artifacts and fine-grained local details. To address this, we propose the first pixel-complete high-resolution detection framework: it jointly models all pixels via multi-patch full-resolution local feature extraction and globally downsampled view fusion. We further introduce a feature aggregation module, a token-level forgery localization mechanism, and a JPEG quality factor estimation module to enable fine-grained detection and explicit decoupling of compression noise from generative artifacts. Evaluated on Chameleon and our newly curated HiRes-50K benchmark, our method achieves +13% and +10% accuracy improvements, respectively, demonstrating significantly enhanced robustness against both high-resolution forgeries and compression-induced distortions.
📝 Abstract
The rapid growth of high-resolution, meticulously crafted AI-generated images poses a significant challenge to existing detection methods, which are often trained and evaluated on low-resolution, automatically generated datasets that do not align with the complexities of high-resolution scenarios. A common practice is to resize or center-crop high-resolution images to fit standard network inputs. However, without full coverage of all pixels, such strategies risk either obscuring subtle, high-frequency artifacts or discarding information from uncovered regions, leading to input information loss. In this paper, we introduce the High-Resolution Detail-Aggregation Network (HiDA-Net), a novel framework that ensures no pixel is left behind. We use the Feature Aggregation Module (FAM), which fuses features from multiple full-resolution local tiles with a down-sampled global view of the image. These local features are aggregated and fused with global representations for final prediction, ensuring that native-resolution details are preserved and utilized for detection. To enhance robustness against challenges such as localized AI manipulations and compression, we introduce Token-wise Forgery Localization (TFL) module for fine-grained spatial sensitivity and JPEG Quality Factor Estimation (QFE) module to disentangle generative artifacts from compression noise explicitly. Furthermore, to facilitate future research, we introduce HiRes-50K, a new challenging benchmark consisting of 50,568 images with up to 64 megapixels. Extensive experiments show that HiDA-Net achieves state-of-the-art, increasing accuracy by over 13% on the challenging Chameleon dataset and 10% on our HiRes-50K.