MMM-Fact: A Multimodal, Multi-Domain Fact-Checking Dataset with Multi-Level Retrieval Difficulty

📅 2025-10-28

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

Existing fact-checking datasets suffer from limitations including unimodality, narrow temporal coverage, shallow evidence, domain imbalance, and insufficient contextual information, hindering realistic evaluation of multimodal fact verification. Method: We introduce the first large-scale, cross-temporal (1995–2025), multimodal (text/image/video/table) fact-checking dataset comprising 125,000 claims, each paired with full news articles and heterogeneous, multi-source evidence. We propose a novel three-tier retrieval difficulty taxonomy—basic, intermediate, and advanced—to quantify verification complexity and enable fine-grained assessment of multi-step, cross-modal reasoning. The dataset integrates resources from four major fact-checking platforms and leading news organizations, employs ternary labeling (true/false/insufficient evidence), and includes LLM-based baseline experiments. Contribution/Results: Our dataset significantly increases evaluation rigor; model performance degrades systematically with increasing evidence complexity, demonstrating its effectiveness in closing the gap between benchmark design and real-world applicability.

Technology Category

Application Category

📝 Abstract

Misinformation and disinformation demand fact checking that goes beyond simple evidence-based reasoning. Existing benchmarks fall short: they are largely single modality (text-only), span short time horizons, use shallow evidence, cover domains unevenly, and often omit full articles -- obscuring models' real-world capability. We present MMM-Fact, a large-scale benchmark of 125,449 fact-checked statements (1995--2025) across multiple domains, each paired with the full fact-check article and multimodal evidence (text, images, videos, tables) from four fact-checking sites and one news outlet. To reflect verification effort, each statement is tagged with a retrieval-difficulty tier -- Basic (1--5 sources), Intermediate (6--10), and Advanced (>10) -- supporting fairness-aware evaluation for multi-step, cross-modal reasoning. The dataset adopts a three-class veracity scheme (true/false/not enough information) and enables tasks in veracity prediction, explainable fact-checking, complex evidence aggregation, and longitudinal analysis. Baselines with mainstream LLMs show MMM-Fact is markedly harder than prior resources, with performance degrading as evidence complexity rises. MMM-Fact offers a realistic, scalable benchmark for transparent, reliable, multimodal fact-checking.

Problem

Research questions and friction points this paper is trying to address.

Addressing multimodal fact-checking beyond text-only evidence

Providing multi-level retrieval difficulty for fairness-aware evaluation

Enabling complex evidence aggregation across multiple domains

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal evidence integration with text images videos

Multi-level retrieval difficulty tiers for evaluation

Large-scale multi-domain dataset with full articles

🔎 Similar Papers

Can LLMs Improve Multimodal Fact-Checking by Asking Relevant Questions?