MMFakeBench: A Mixed-Source Multimodal Misinformation Detection Benchmark for LVLMs

📅 2024-06-13

🏛️ arXiv.org

📈 Citations: 7

✨ Influential: 1

career value

226K/year

🤖 AI Summary

Real-world scenarios often involve coexisting multi-source manipulations, yet existing fake news detection methods predominantly assume single-source, single-modality falsifications and lack benchmarks for mixed-source multimodal misinformation. Method: We introduce MMFakeBench—the first mixed-source multimodal fake news detection benchmark—covering three distortion categories (textual, visual, and cross-modal inconsistency) and 12 fine-grained subtypes, enabling zero-shot evaluation of large vision-language models (LVLMs) and dedicated detectors. We formally define mixed-source multimodal misinformation and propose MMD-Agent, a LVLM-based agent framework featuring multi-step reasoning and tool-augmented inference for fine-grained distortion modeling and generalized detection. Results: Evaluating 15 LVLMs and 6 detection methods on MMFakeBench reveals substantial performance degradation under mixed-source conditions. MMD-Agent achieves an average accuracy gain of 12.7% and significantly improves cross-distortion generalization.

Technology Category

Application Category

📝 Abstract

Current multimodal misinformation detection (MMD) methods often assume a single source and type of forgery for each sample, which is insufficient for real-world scenarios where multiple forgery sources coexist. The lack of a benchmark for mixed-source misinformation has hindered progress in this field. To address this, we introduce MMFakeBench, the first comprehensive benchmark for mixed-source MMD. MMFakeBench includes 3 critical sources: textual veracity distortion, visual veracity distortion, and cross-modal consistency distortion, along with 12 sub-categories of misinformation forgery types. We further conduct an extensive evaluation of 6 prevalent detection methods and 15 Large Vision-Language Models (LVLMs) on MMFakeBench under a zero-shot setting. The results indicate that current methods struggle under this challenging and realistic mixed-source MMD setting. Additionally, we propose MMD-Agent, a novel approach to integrate the reasoning, action, and tool-use capabilities of LVLM agents, significantly enhancing accuracy and generalization. We believe this study will catalyze future research into more realistic mixed-source multimodal misinformation and provide a fair evaluation of misinformation detection methods.

Problem

Research questions and friction points this paper is trying to address.

Mixed-source multimodal misinformation detection

Lack of comprehensive benchmark

Enhancing accuracy with MMD-Agent

Innovation

Methods, ideas, or system contributions that make the work stand out.

MMFakeBench: first mixed-source MMD benchmark

Evaluation of 6 methods, 15 LVLMs zero-shot

MMD-Agent: integrates LVLM reasoning, action, tool-use

🔎 Similar Papers

MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language Models