🤖 AI Summary
Existing multimodal fact-checking datasets suffer from limited scale, narrow language/task coverage, evidence leakage, and reliance on external news sources. To address these limitations, we introduce M3FVC—the first large-scale, multimodal, multilingual, and multicultural real-world fact-checking dataset—covering 10 languages and six cross-task verification scenarios: image authenticity assessment, claim–image alignment, multilingual claim mapping, and final verdict prediction, with strict prevention of evidence leakage. The dataset comprises 4,982 images and 6,980 claims, all manually annotated and verified by professional fact-checking organizations. We propose a unified end-to-end multitask learning framework integrating multimodal alignment, cross-lingual claim mapping, and task-cascaded modeling. Experimental results demonstrate that joint training of intermediate tasks significantly improves final verdict accuracy. M3FVC establishes a new benchmark and methodological foundation for automated, real-world fact-checking in complex, heterogeneous scenarios.
📝 Abstract
Existing real-world datasets for multimodal automated fact-checking have multiple limitations: they contain few instances, focus on only one or two languages and tasks, suffer from evidence leakage, or depend on external sets of news articles for sourcing true claims. To address these shortcomings, we introduce M4FC, a new real-world dataset comprising 4,982 images paired with 6,980 claims. The images, verified by professional fact-checkers from 22 organizations, represent diverse cultural and geographic contexts. Each claim is available in one or two out of ten languages. M4FC spans six multimodal fact-checking tasks: visual claim extraction, claimant intent prediction, fake detection, image contextualization, location verification, and verdict prediction. We provide baseline results for all tasks and analyze how combining intermediate tasks influence downstream verdict prediction performance. We make our dataset and code available.