DiffSeg30k: A Multi-Turn Diffusion Editing Benchmark for Localized AIGC Detection

📅 2025-11-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing AIGC detection benchmarks focus on image-level binary classification and are ill-suited for detecting localized forgeries produced by diffusion-based editing. Method: We propose the first fine-grained localization benchmark for multi-round diffusion editing, comprising a large-scale dataset of 30,000 images generated by eight state-of-the-art diffusion models, featuring real-world scenes, multi-model collaborative editing, context-aware editing chains, and vision-language model–generated editing instructions. Our method employs a semantic segmentation model to jointly perform pixel-level localization of edited regions and identification of the generating model. Contribution/Results: The segmentation model achieves high-precision localization while maintaining strong whole-image classification accuracy—significantly outperforming conventional forgery detection approaches—and demonstrates robust cross-generator generalization. This work advances AIGC detection from image-level classification toward pixel-level understanding.

Technology Category

Application Category

📝 Abstract
Diffusion-based editing enables realistic modification of local image regions, making AI-generated content harder to detect. Existing AIGC detection benchmarks focus on classifying entire images, overlooking the localization of diffusion-based edits. We introduce DiffSeg30k, a publicly available dataset of 30k diffusion-edited images with pixel-level annotations, designed to support fine-grained detection. DiffSeg30k features: 1) In-the-wild images--we collect images or image prompts from COCO to reflect real-world content diversity; 2) Diverse diffusion models--local edits using eight SOTA diffusion models; 3) Multi-turn editing--each image undergoes up to three sequential edits to mimic real-world sequential editing; and 4) Realistic editing scenarios--a vision-language model (VLM)-based pipeline automatically identifies meaningful regions and generates context-aware prompts covering additions, removals, and attribute changes. DiffSeg30k shifts AIGC detection from binary classification to semantic segmentation, enabling simultaneous localization of edits and identification of the editing models. We benchmark three baseline segmentation approaches, revealing significant challenges in semantic segmentation tasks, particularly concerning robustness to image distortions. Experiments also reveal that segmentation models, despite being trained for pixel-level localization, emerge as highly reliable whole-image classifiers of diffusion edits, outperforming established forgery classifiers while showing great potential in cross-generator generalization. We believe DiffSeg30k will advance research in fine-grained localization of AI-generated content by demonstrating the promise and limitations of segmentation-based methods. DiffSeg30k is released at: https://huggingface.co/datasets/Chaos2629/Diffseg30k
Problem

Research questions and friction points this paper is trying to address.

Detecting localized AI-generated edits in images
Addressing limitations of binary classification benchmarks
Enabling pixel-level localization of diffusion-based modifications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dataset with pixel-level annotations for fine-grained detection
Multi-turn sequential edits using eight diffusion models
Vision-language model pipeline for realistic editing scenarios
🔎 Similar Papers
No similar papers found.
Hai Ci
Hai Ci
National University of Singapore; Peking University
Computer VisionMachine LearningTrustworthy AI
Z
Ziheng Peng
South China University of Technology
P
Pei Yang
Show Lab, National University of Singapore
Y
Yingxin Xuan
Show Lab, National University of Singapore
M
Mike Zheng Shou
Show Lab, National University of Singapore