GIM: A Million-scale Benchmark for Generative Image Manipulation Detection and Localization

📅 2024-06-24
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing generative image manipulation detection and localization methods suffer from a lack of large-scale, diverse benchmarks and efficient, scalable approaches. Method: This paper introduces GIM, the first million-scale benchmark (1M+ AI-manipulated/authentic image pairs), covering diverse content domains and state-of-the-art generative models (e.g., diffusion models, NeRF). We propose GIMFormer—a novel framework featuring a localized manipulation synthesis pipeline integrating SAM-based segmentation, LLM-driven prompting, and diffusion/NeRF-based generation—alongside three core modules: ShadowTracer for precise localization, FSB for joint frequency-spatial modeling, and MWAM for multi-window anomaly modeling. Contribution/Results: GIMFormer achieves significant improvements over SOTA on GIM and multiple public benchmarks (e.g., IMDF, FakeBench), substantially enhancing evaluation capabilities in diversity, robustness, and generalization for generative image forensics.

Technology Category

Application Category

📝 Abstract
The extraordinary ability of generative models emerges as a new trend in image editing and generating realistic images, posing a serious threat to the trustworthiness of multimedia data and driving the research of image manipulation detection and location (IMDL). However, the lack of a large-scale data foundation makes the IMDL task unattainable. In this paper, we build a local manipulation data generation pipeline that integrates the powerful capabilities of SAM, LLM, and generative models. Upon this basis, we propose the GIM dataset, which has the following advantages: 1) Large scale, GIM includes over one million pairs of AI-manipulated images and real images. 2) Rich image content, GIM encompasses a broad range of image classes. 3) Diverse generative manipulation, the images are manipulated images with state-of-the-art generators and various manipulation tasks. The aforementioned advantages allow for a more comprehensive evaluation of IMDL methods, extending their applicability to diverse images. We introduce the GIM benchmark with two settings to evaluate existing IMDL methods. In addition, we propose a novel IMDL framework, termed GIMFormer, which consists of a ShadowTracer, Frequency-Spatial block (FSB), and a Multi-Window Anomalous Modeling (MWAM) module. Extensive experiments on the GIM demonstrate that GIMFormer surpasses the previous state-of-the-art approach on two different benchmarks.
Problem

Research questions and friction points this paper is trying to address.

Image Authenticity
Forgery Detection
AI Manipulation
Innovation

Methods, ideas, or system contributions that make the work stand out.

GIM Dataset
GIMFormer Framework
Image Integrity Detection
🔎 Similar Papers
No similar papers found.
Y
Yirui Chen
Shanghai Jiao Tong University, Huawei Noah’s Ark Lab
X
Xu Huang
Huawei Noah’s Ark Lab
Q
Quan Zhang
Tsinghua University, Huawei Noah’s Ark Lab
W
Wei Li
Shanghai Jiao Tong University, Huawei Noah’s Ark Lab
M
Mingjian Zhu
Huawei Noah’s Ark Lab
Qi Yan
Qi Yan
PhD Student, University of British Columbia
machine learningrobotics
S
Simiao Li
Huawei Noah’s Ark Lab
Hanting Chen
Hanting Chen
Noah's Ark Lab, Huawei
deep learningmachine learningcomputer vision
Hailin Hu
Hailin Hu
Huawei Noah's Ark Lab
J
Jie Yang
Shanghai Jiao Tong University
W
Wei Liu
Shanghai Jiao Tong University
J
Jie Hu
Huawei Noah’s Ark Lab