No Pixel Left Behind: A Detail-Preserving Architecture for Robust High-Resolution AI-Generated Image Detection

📅 2025-08-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing AI-generated image detection methods predominantly rely on low-resolution inputs and suffer substantial performance degradation on high-resolution, photorealistic forgeries; conventional preprocessing—such as downscaling or cropping—often discards high-frequency artifacts and fine-grained local details. To address this, we propose the first pixel-complete high-resolution detection framework: it jointly models all pixels via multi-patch full-resolution local feature extraction and globally downsampled view fusion. We further introduce a feature aggregation module, a token-level forgery localization mechanism, and a JPEG quality factor estimation module to enable fine-grained detection and explicit decoupling of compression noise from generative artifacts. Evaluated on Chameleon and our newly curated HiRes-50K benchmark, our method achieves +13% and +10% accuracy improvements, respectively, demonstrating significantly enhanced robustness against both high-resolution forgeries and compression-induced distortions.

Technology Category

Application Category

📝 Abstract
The rapid growth of high-resolution, meticulously crafted AI-generated images poses a significant challenge to existing detection methods, which are often trained and evaluated on low-resolution, automatically generated datasets that do not align with the complexities of high-resolution scenarios. A common practice is to resize or center-crop high-resolution images to fit standard network inputs. However, without full coverage of all pixels, such strategies risk either obscuring subtle, high-frequency artifacts or discarding information from uncovered regions, leading to input information loss. In this paper, we introduce the High-Resolution Detail-Aggregation Network (HiDA-Net), a novel framework that ensures no pixel is left behind. We use the Feature Aggregation Module (FAM), which fuses features from multiple full-resolution local tiles with a down-sampled global view of the image. These local features are aggregated and fused with global representations for final prediction, ensuring that native-resolution details are preserved and utilized for detection. To enhance robustness against challenges such as localized AI manipulations and compression, we introduce Token-wise Forgery Localization (TFL) module for fine-grained spatial sensitivity and JPEG Quality Factor Estimation (QFE) module to disentangle generative artifacts from compression noise explicitly. Furthermore, to facilitate future research, we introduce HiRes-50K, a new challenging benchmark consisting of 50,568 images with up to 64 megapixels. Extensive experiments show that HiDA-Net achieves state-of-the-art, increasing accuracy by over 13% on the challenging Chameleon dataset and 10% on our HiRes-50K.
Problem

Research questions and friction points this paper is trying to address.

Detecting high-resolution AI-generated images without losing pixel-level details
Addressing information loss from resizing or cropping high-resolution images
Improving robustness against localized manipulations and compression artifacts
Innovation

Methods, ideas, or system contributions that make the work stand out.

High-Resolution Detail-Aggregation Network for full pixel coverage
Feature Aggregation Module fuses local tiles with global view
Token-wise Forgery Localization and JPEG Quality Estimation modules
🔎 Similar Papers
No similar papers found.
L
Lianrui Mu
Zhejiang University
Z
Zou Xingze
Zhejiang University
Jianhong Bai
Jianhong Bai
Zhejiang University
Computer Vision
Jiaqi Hu
Jiaqi Hu
Rice University; Genentech
Artificial IntelligenceDeep Learning
W
Wenjie Zheng
Zhejiang University
J
Jiangnan Ye
Zhejiang University
J
Jiedong Zhuang
Zhejiang University
M
Mudassar Ali
Zhejiang University
J
Jing Wang
Zhejiang University
Haoji Hu
Haoji Hu
Zhejiang Univeristy, China
Machine LearningComputer VisionDeep Learning