Zoom-In to Sort AI-Generated Images Out

📅 2025-10-05

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

The increasing photorealism of AI-generated images blurs the boundary between authentic and synthetic content, while existing vision-language model (VLM)-based detection methods lack robustness in identifying subtle artificial artifacts. To address this, we propose ZoomIn, a two-stage interpretable digital forensics framework: (1) coarse-grained localization of suspicious regions, followed by (2) fine-grained magnification and analysis of local anomalies—mimicking human visual inspection to jointly optimize detection accuracy and interpretability. To support training and evaluation, we introduce MagniFake, the first high-quality forgery dataset with pixel-level annotations, and design a VLM-driven automated pipeline for synthetic data generation. Experiments demonstrate that ZoomIn achieves 96.39% accuracy across diverse multi-source test sets, exhibits strong generalization, and generates natural-language explanations grounded in visual evidence.

Technology Category

Application Category

📝 Abstract

The rapid growth of AI-generated imagery has blurred the boundary between real and synthetic content, raising critical concerns for digital integrity. Vision-language models (VLMs) offer interpretability through explanations but often fail to detect subtle artifacts in high-quality synthetic images. We propose ZoomIn, a two-stage forensic framework that improves both accuracy and interpretability. Mimicking human visual inspection, ZoomIn first scans an image to locate suspicious regions and then performs a focused analysis on these zoomed-in areas to deliver a grounded verdict. To support training, we introduce MagniFake, a dataset of 20,000 real and high-quality synthetic images annotated with bounding boxes and forensic explanations, generated through an automated VLM-based pipeline. Our method achieves 96.39% accuracy with robust generalization, while providing human-understandable explanations grounded in visual evidence.

Problem

Research questions and friction points this paper is trying to address.

Detect subtle artifacts in AI-generated images

Improve accuracy and interpretability of forensic analysis

Address blurred boundaries between real and synthetic content

Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage forensic framework with zoom-in analysis

Automated VLM pipeline generates annotated training dataset

Mimics human inspection with localized suspicious region detection

🔎 Similar Papers

TextureCrop: Enhancing Synthetic Image Detection through Texture-based Cropping

2024-07-22arXiv.orgCitations: 1