🤖 AI Summary
Existing image forgery detection benchmarks suffer from limited scale and insufficient diversity, resulting in poor model robustness. To address this, we propose DiQuID—the first large-scale, high-diversity benchmark specifically designed for inpainting-based forgeries, comprising over 95,000 samples. DiQuID introduces a novel three-stage paradigm: Semantic-Aligned Object Replacement (SAOR), Multi-Model Inpainting Integration (MMII), and Uncertainty-Guided Deceptive Assessment (UGDA). Leveraging instance-segmentation-driven precise localization, multi-pipeline generation, and contrastive realism quantification, DiQuID significantly enhances data authenticity and evaluation rigor. Experiments demonstrate that DiQuID surpasses state-of-the-art benchmarks in both technical fidelity and aesthetic quality. Furthermore, detectors trained on DiQuID maintain high accuracy even on highly deceptive, human-indistinguishable forgeries—substantially improving generalization to real-world scenarios and advancing the field of forgery detection.
📝 Abstract
Recent advances in generative models enable highly realistic image manipulations, creating an urgent need for robust forgery detection methods. Current datasets for training and evaluating these methods are limited in scale and diversity. To address this, we propose a methodology for creating high-quality inpainting datasets and apply it to create DiQuID, comprising over 95,000 inpainted images generated from 78,000 original images sourced from MS-COCO, RAISE, and OpenImages. Our methodology consists of three components: (1) Semantically Aligned Object Replacement (SAOR) that identifies suitable objects through instance segmentation and generates contextually appropriate prompts, (2) Multiple Model Image Inpainting (MMII) that employs various state-of-the-art inpainting pipelines primarily based on diffusion models to create diverse manipulations, and (3) Uncertainty-Guided Deceptiveness Assessment (UGDA) that evaluates image realism through comparative analysis with originals. The resulting dataset surpasses existing ones in diversity, aesthetic quality, and technical quality. We provide comprehensive benchmarking results using state-of-the-art forgery detection methods, demonstrating the dataset's effectiveness in evaluating and improving detection algorithms. Through a human study with 42 participants on 1,000 images, we show that while humans struggle with images classified as deceiving by our methodology, models trained on our dataset maintain high performance on these challenging cases. Code and dataset are available at https://github.com/mever-team/DiQuID.