SciFigDetect: A Benchmark for AI-Generated Scientific Figure Detection

๐Ÿ“… 2026-04-09
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the challenge that existing AI-generated image detectors struggle to effectively identify structured, text-dense, and semantically precise scientific figures synthesized by generative models, largely due to the absence of a dedicated benchmark. To bridge this gap, the authors construct the first detection benchmark tailored to this task, leveraging an agent-driven data pipeline that automatically collects academic papers, parses multimodal content, generates structured prompts, and synthesizes corresponding figures. Through iterative reviewer-in-the-loop filtering, they curate a diverse dataset of realโ€“synthetic figure pairs spanning multiple categories and generative sources. Experimental results demonstrate that current detectors suffer significant performance degradation under zero-shot transfer, cross-generator generalization, and image degradation scenarios, revealing their limited robustness and underscoring the critical role of the proposed benchmark in advancing research on detecting AI-generated scientific imagery.
๐Ÿ“ Abstract
Modern multimodal generators can now produce scientific figures at near-publishable quality, creating a new challenge for visual forensics and research integrity. Unlike conventional AI-generated natural images, scientific figures are structured, text-dense, and tightly aligned with scholarly semantics, making them a distinct and difficult detection target. However, existing AI-generated image detection benchmarks and methods are almost entirely developed for open-domain imagery, leaving this setting largely unexplored. We present the first benchmark for AI-generated scientific figure detection. To construct it, we develop an agent-based data pipeline that retrieves licensed source papers, performs multimodal understanding of paper text and figures, builds structured prompts, synthesizes candidate figures, and filters them through a review-driven refinement loop. The resulting benchmark covers multiple figure categories, multiple generation sources and aligned real--synthetic pairs. We benchmark representative detectors under zero-shot, cross-generator, and degraded-image settings. Results show that current methods fail dramatically in zero-shot transfer, exhibit strong generator-specific overfitting, and remain fragile under common post-processing corruptions. These findings reveal a substantial gap between existing AIGI detection capabilities and the emerging distribution of high-quality scientific figures. We hope this benchmark can serve as a foundation for future research on robust and generalizable scientific-figure forensics. The dataset is available at https://github.com/Joyce-yoyo/SciFigDetect.
Problem

Research questions and friction points this paper is trying to address.

AI-generated scientific figures
visual forensics
research integrity
detection benchmark
multimodal generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

scientific figure detection
AI-generated image benchmark
multimodal understanding
agent-based data pipeline
research integrity
๐Ÿ”Ž Similar Papers
No similar papers found.
Y
You Hu
Zhejiang University
C
Chenzhuo Zhao
Independent Researcher
C
Changfa Mo
Zhejiang University
H
Haotian Liu
University of Oulu
Xiaobai Li
Xiaobai Li
IEEE senior member, ZJU100 professor, Zhejiang University
Computer Vision - Affective computing - Biometrics