🤖 AI Summary
Existing large vision-language models (LVLMs) lack a comprehensive benchmark tailored for synthetic media detection. This paper introduces ForgeryBench, the first LVLM-specific benchmark for synthetic media detection, encompassing 112 forgery categories organized along a five-dimensional taxonomy—semantic, modality, task, forgery type, and generative model—and comprising 63,292 multiple-choice visual question-answering items to jointly evaluate recognition, localization, and reasoning capabilities. We propose forgery-centric prompt engineering, fine-grained forgery ontology modeling, and a cross-model unified evaluation protocol. Systematic evaluation across 22 open-source and 3 proprietary state-of-the-art LVLMs reveals pervasive performance bottlenecks in complex forgery scenarios. ForgeryBench is currently the largest LVLM-specific forensic evaluation dataset and the first framework enabling multidimensional, fine-grained analysis of synthetic media detection capabilities.
📝 Abstract
Recently, the rapid development of AIGC has significantly boosted the diversities of fake media spread in the Internet, posing unprecedented threats to social security, politics, law, and etc. To detect the ever-increasingly diverse malicious fake media in the new era of AIGC, recent studies have proposed to exploit Large Vision Language Models (LVLMs) to design robust forgery detectors due to their impressive performance on a wide range of multimodal tasks. However, it still lacks a comprehensive benchmark designed to comprehensively assess LVLMs' discerning capabilities on forgery media. To fill this gap, we present Forensics-Bench, a new forgery detection evaluation benchmark suite to assess LVLMs across massive forgery detection tasks, requiring comprehensive recognition, location and reasoning capabilities on diverse forgeries. Forensics-Bench comprises 63,292 meticulously curated multi-choice visual questions, covering 112 unique forgery detection types from 5 perspectives: forgery semantics, forgery modalities, forgery tasks, forgery types and forgery models. We conduct thorough evaluations on 22 open-sourced LVLMs and 3 proprietary models GPT-4o, Gemini 1.5 Pro, and Claude 3.5 Sonnet, highlighting the significant challenges of comprehensive forgery detection posed by Forensics-Bench. We anticipate that Forensics-Bench will motivate the community to advance the frontier of LVLMs, striving for all-around forgery detectors in the era of AIGC. The deliverables will be updated at https://Forensics-Bench.github.io/.