VeriTaS: The First Dynamic Benchmark for Multimodal Automated Fact-Checking

📅 2026-01-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses critical limitations in existing automatic fact-checking benchmarks, which are often constrained in scope, modality, language coverage, and types of misinformation, and further compromised by static datasets vulnerable to data leakage from large model pretraining. To overcome these challenges, we propose VeriTaS—the first dynamic, multimodal fact-checking benchmark—built via a seven-stage automated pipeline that continuously ingests real-world claims from 108 global fact-checking organizations. VeriTaS supports both textual and audiovisual content, incorporates a standardized and decoupled expert adjudication mapping mechanism, and features a fully automated quarterly update strategy. The benchmark encompasses 24,000 claims across 54 languages, with human evaluations confirming high alignment between automated annotations and human judgments, thereby offering a robust, leakage-resistant, and sustainable evaluation platform for the era of large language models.

Technology Category

Application Category

📝 Abstract
The growing scale of online misinformation urgently demands Automated Fact-Checking (AFC). Existing benchmarks for evaluating AFC systems, however, are largely limited in terms of task scope, modalities, domain, language diversity, realism, or coverage of misinformation types. Critically, they are static, thus subject to data leakage as their claims enter the pretraining corpora of LLMs. As a result, benchmark performance no longer reliably reflects the actual ability to verify claims. We introduce Verified Theses and Statements (VeriTaS), the first dynamic benchmark for multimodal AFC, designed to remain robust under ongoing large-scale pretraining of foundation models. VeriTaS currently comprises 24,000 real-world claims from 108 professional fact-checking organizations across 54 languages, covering textual and audiovisual content. Claims are added quarterly via a fully automated seven-stage pipeline that normalizes claim formulation, retrieves original media, and maps heterogeneous expert verdicts to a novel, standardized, and disentangled scoring scheme with textual justifications. Through human evaluation, we demonstrate that the automated annotations closely match human judgments. We commit to update VeriTaS in the future, establishing a leakage-resistant benchmark, supporting meaningful AFC evaluation in the era of rapidly evolving foundation models. We will make the code and data publicly available.
Problem

Research questions and friction points this paper is trying to address.

Automated Fact-Checking
multimodal
benchmark
data leakage
misinformation
Innovation

Methods, ideas, or system contributions that make the work stand out.

dynamic benchmark
multimodal fact-checking
data leakage resistance
automated annotation pipeline
standardized verdict scoring
🔎 Similar Papers
No similar papers found.
M
Mark Rothermel
Multimodal AI Lab, Technical University of Darmstadt, hessian.AI
M
Marcus Kornmann
Multimodal AI Lab, Technical University of Darmstadt, hessian.AI
Marcus Rohrbach
Marcus Rohrbach
Professor for Multimodal Reliable AI, TU Darmstadt, Germany
Machine LearningComputer VisionAI
Anna Rohrbach
Anna Rohrbach
Professor, TU Darmstadt, Germany
Vision and LanguageArtificial IntelligenceMultimodal Grounded Learning