Evidence-Grounded Multimodal Misinformation Detection with Attention-Based GNNs

📅 2025-05-23

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

Multimodal disinformation—authentic images paired with misleading captions—poses challenges for large language/vision-language models (LLMs/LVLMs), which often suffer from hallucination due to insufficient contextual reasoning and reliance on parametric knowledge alone. Method: We propose a dual-graph structured verification framework: an *assertion graph* derived from the image caption and an *evidence graph* constructed from retrieved external textual sources. An attention-enhanced graph neural network models cross-modal consistency between these graphs, enabling explicit contextualization via external evidence while avoiding hallucination. The framework employs lightweight, task-specific graph encoding and contrastive learning. Contribution/Results: Our method achieves 93.05% accuracy on standard benchmarks—outperforming the best LLM-based approach by 2.82%. It demonstrates that structured graph-based reasoning significantly enhances both efficacy and robustness in multimodal misinformation detection.

Technology Category

Application Category

📝 Abstract

Multimodal out-of-context (OOC) misinformation is misinformation that repurposes real images with unrelated or misleading captions. Detecting such misinformation is challenging because it requires resolving the context of the claim before checking for misinformation. Many current methods, including LLMs and LVLMs, do not perform this contextualization step. LLMs hallucinate in absence of context or parametric knowledge. In this work, we propose a graph-based method that evaluates the consistency between the image and the caption by constructing two graph representations: an evidence graph, derived from online textual evidence, and a claim graph, from the claim in the caption. Using graph neural networks (GNNs) to encode and compare these representations, our framework then evaluates the truthfulness of image-caption pairs. We create datasets for our graph-based method, evaluate and compare our baseline model against popular LLMs on the misinformation detection task. Our method scores $93.05%$ detection accuracy on the evaluation set and outperforms the second-best performing method (an LLM) by $2.82%$, making a case for smaller and task-specific methods.

Problem

Research questions and friction points this paper is trying to address.

Detecting multimodal out-of-context misinformation accurately

Resolving claim context before misinformation verification

Improving detection over LLMs with task-specific GNNs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses attention-based GNNs for detection

Constructs evidence and claim graphs

Achieves 93.05% detection accuracy

🔎 Similar Papers

No similar papers found.