Perception, Understanding and Reasoning, A Multimodal Benchmark for Video Fake News Detection

📅 2025-10-28

📈 Citations: 0

✨ Influential: 0

career value

228K/year

🤖 AI Summary

Existing video fake news detection (VFND) benchmarks solely evaluate final decision accuracy, lacking fine-grained interpretability analysis of perceptual, comprehension, and reasoning processes. To address this, we propose MVFNDB, a multimodal VFND benchmark comprising 10 fine-grained tasks and 9,730 human-annotated questions, along with a capability taxonomy and a staged evaluation framework. We introduce MFVND-CoT, a novel chain-of-thought reasoning paradigm that jointly models creator intent and raw visual features, enhanced by video–text alignment and multi-feature fusion. Extensive experiments demonstrate that our framework significantly improves both detection performance and model interpretability. MVFNDB is the first systematically designed, high-quality, human-annotated benchmark enabling comprehensive capability assessment and mechanistic analysis of multimodal large language models (MLLMs) in VFND.

Technology Category

Application Category

📝 Abstract

The advent of multi-modal large language models (MLLMs) has greatly advanced research into applications for Video fake news detection (VFND) tasks. Traditional video-based FND benchmarks typically focus on the accuracy of the final decision, often failing to provide fine-grained assessments for the entire detection process, making the detection process a black box. Therefore, we introduce the MVFNDB (Multi-modal Video Fake News Detection Benchmark) based on the empirical analysis, which provides foundation for tasks definition. The benchmark comprises 10 tasks and is meticulously crafted to probe MLLMs' perception, understanding, and reasoning capacities during detection, featuring 9730 human-annotated video-related questions based on a carefully constructed taxonomy ability of VFND. To validate the impact of combining multiple features on the final results, we design a novel framework named MVFND-CoT, which incorporates both creator-added content and original shooting footage reasoning. Building upon the benchmark, we conduct an in-depth analysis of the deeper factors influencing accuracy, including video processing strategies and the alignment between video features and model capabilities. We believe this benchmark will lay a solid foundation for future evaluations and advancements of MLLMs in the domain of video fake news detection.

Problem

Research questions and friction points this paper is trying to address.

Evaluating multimodal models' perception, understanding, and reasoning in fake news detection

Providing fine-grained assessment beyond final accuracy for detection processes

Analyzing video feature alignment and processing strategies' impact on detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces MVFNDB benchmark with 10 tasks

Designs MVFND-CoT framework combining multiple features

Analyzes video processing strategies and feature alignment

🔎 Similar Papers

Official-NV: An LLM-Generated News Video Dataset for Multimodal Fake News Detection