🤖 AI Summary
Existing generative image manipulation detection and localization methods suffer from a lack of large-scale, diverse benchmarks and efficient, scalable approaches. Method: This paper introduces GIM, the first million-scale benchmark (1M+ AI-manipulated/authentic image pairs), covering diverse content domains and state-of-the-art generative models (e.g., diffusion models, NeRF). We propose GIMFormer—a novel framework featuring a localized manipulation synthesis pipeline integrating SAM-based segmentation, LLM-driven prompting, and diffusion/NeRF-based generation—alongside three core modules: ShadowTracer for precise localization, FSB for joint frequency-spatial modeling, and MWAM for multi-window anomaly modeling. Contribution/Results: GIMFormer achieves significant improvements over SOTA on GIM and multiple public benchmarks (e.g., IMDF, FakeBench), substantially enhancing evaluation capabilities in diversity, robustness, and generalization for generative image forensics.
📝 Abstract
The extraordinary ability of generative models emerges as a new trend in image editing and generating realistic images, posing a serious threat to the trustworthiness of multimedia data and driving the research of image manipulation detection and location (IMDL). However, the lack of a large-scale data foundation makes the IMDL task unattainable. In this paper, we build a local manipulation data generation pipeline that integrates the powerful capabilities of SAM, LLM, and generative models. Upon this basis, we propose the GIM dataset, which has the following advantages: 1) Large scale, GIM includes over one million pairs of AI-manipulated images and real images. 2) Rich image content, GIM encompasses a broad range of image classes. 3) Diverse generative manipulation, the images are manipulated images with state-of-the-art generators and various manipulation tasks. The aforementioned advantages allow for a more comprehensive evaluation of IMDL methods, extending their applicability to diverse images. We introduce the GIM benchmark with two settings to evaluate existing IMDL methods. In addition, we propose a novel IMDL framework, termed GIMFormer, which consists of a ShadowTracer, Frequency-Spatial block (FSB), and a Multi-Window Anomalous Modeling (MWAM) module. Extensive experiments on the GIM demonstrate that GIMFormer surpasses the previous state-of-the-art approach on two different benchmarks.