🤖 AI Summary
Existing AI-generated image detection datasets focus predominantly on object-level manipulations, neglecting scene-level edits—such as sky or ground modifications—limiting generalizability. To address this, we introduce BR-Gen, the first large-scale (150K samples), scene-region-oriented dataset for localized AI forgery detection. We further propose NFA-ViT, a noise-guided forgery enhancement Vision Transformer. Its key contributions are: (1) a novel scene-aware fine-grained annotation paradigm; (2) a dual-component mechanism integrating noise fingerprint localization with multi-region attention-based feature interaction to enable global propagation of forensic cues; and (3) semantic-calibrated labeling coupled with an automated perception-generation-evaluation pipeline. Experiments demonstrate that NFA-ViT achieves significant performance gains over state-of-the-art methods on BR-Gen and exhibits strong cross-dataset generalization across multiple established benchmarks.
📝 Abstract
The rise of AI-generated image editing tools has made localized forgeries increasingly realistic, posing challenges for visual content integrity. Although recent efforts have explored localized AIGC detection, existing datasets predominantly focus on object-level forgeries while overlooking broader scene edits in regions such as sky or ground. To address these limitations, we introduce extbf{BR-Gen}, a large-scale dataset of 150,000 locally forged images with diverse scene-aware annotations, which are based on semantic calibration to ensure high-quality samples. BR-Gen is constructed through a fully automated Perception-Creation-Evaluation pipeline to ensure semantic coherence and visual realism. In addition, we further propose extbf{NFA-ViT}, a Noise-guided Forgery Amplification Vision Transformer that enhances the detection of localized forgeries by amplifying forgery-related features across the entire image. NFA-ViT mines heterogeneous regions in images, emph{i.e.}, potential edited areas, by noise fingerprints. Subsequently, attention mechanism is introduced to compel the interaction between normal and abnormal features, thereby propagating the generalization traces throughout the entire image, allowing subtle forgeries to influence a broader context and improving overall detection robustness. Extensive experiments demonstrate that BR-Gen constructs entirely new scenarios that are not covered by existing methods. Take a step further, NFA-ViT outperforms existing methods on BR-Gen and generalizes well across current benchmarks. All data and codes are available at https://github.com/clpbc/BR-Gen.