Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models

📅 2025-03-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing large vision-language models (LVLMs) lack a comprehensive benchmark tailored for synthetic media detection. This paper introduces ForgeryBench, the first LVLM-specific benchmark for synthetic media detection, encompassing 112 forgery categories organized along a five-dimensional taxonomy—semantic, modality, task, forgery type, and generative model—and comprising 63,292 multiple-choice visual question-answering items to jointly evaluate recognition, localization, and reasoning capabilities. We propose forgery-centric prompt engineering, fine-grained forgery ontology modeling, and a cross-model unified evaluation protocol. Systematic evaluation across 22 open-source and 3 proprietary state-of-the-art LVLMs reveals pervasive performance bottlenecks in complex forgery scenarios. ForgeryBench is currently the largest LVLM-specific forensic evaluation dataset and the first framework enabling multidimensional, fine-grained analysis of synthetic media detection capabilities.

Technology Category

Application Category

📝 Abstract
Recently, the rapid development of AIGC has significantly boosted the diversities of fake media spread in the Internet, posing unprecedented threats to social security, politics, law, and etc. To detect the ever-increasingly diverse malicious fake media in the new era of AIGC, recent studies have proposed to exploit Large Vision Language Models (LVLMs) to design robust forgery detectors due to their impressive performance on a wide range of multimodal tasks. However, it still lacks a comprehensive benchmark designed to comprehensively assess LVLMs' discerning capabilities on forgery media. To fill this gap, we present Forensics-Bench, a new forgery detection evaluation benchmark suite to assess LVLMs across massive forgery detection tasks, requiring comprehensive recognition, location and reasoning capabilities on diverse forgeries. Forensics-Bench comprises 63,292 meticulously curated multi-choice visual questions, covering 112 unique forgery detection types from 5 perspectives: forgery semantics, forgery modalities, forgery tasks, forgery types and forgery models. We conduct thorough evaluations on 22 open-sourced LVLMs and 3 proprietary models GPT-4o, Gemini 1.5 Pro, and Claude 3.5 Sonnet, highlighting the significant challenges of comprehensive forgery detection posed by Forensics-Bench. We anticipate that Forensics-Bench will motivate the community to advance the frontier of LVLMs, striving for all-around forgery detectors in the era of AIGC. The deliverables will be updated at https://Forensics-Bench.github.io/.
Problem

Research questions and friction points this paper is trying to address.

Lack of comprehensive benchmark for LVLMs' forgery detection capabilities.
Need for robust forgery detectors in the AIGC era.
Forensics-Bench assesses LVLMs across diverse forgery detection tasks.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Forensics-Bench: comprehensive forgery detection benchmark
63,292 multi-choice visual questions for LVLMs
Evaluates 22 open-source and 3 proprietary LVLMs
🔎 Similar Papers
No similar papers found.
J
Jin Wang
The University of Hong Kong, HKU Shanghai Intelligent Computing Research Center
C
Chenghui Lv
Zhejiang Laboratory, Hangzhou Institute for Advanced Study
X
Xian Li
Zhejiang University, Zhejiang Laboratory
Shichao Dong
Shichao Dong
Nanyang Technological University
H
Huadong Li
MEGVII Technology
K
kelu Yao
Zhejiang Laboratory
C
Chao Li
Zhejiang Laboratory
Wenqi Shao
Wenqi Shao
Researcher at Shanghai AI Laboratory
Foundation Model EvaluationLLM CompressionEfficient AdaptationMultimodal Learning
Ping Luo
Ping Luo
National University of Defense Technology
distributed_computing