Effectiveness of Large Multimodal Models in Detecting Disinformation: Experimental Results

📅 2025-09-26
📈 Citations: 0
✹ Influential: 0
📄 PDF
đŸ€– AI Summary
This study addresses the challenge of multimodal fake news detection from image-text pairs. Methodologically, it proposes a systematic analysis framework powered by GPT-4o, integrating prompt engineering optimization, joint image-text preprocessing, and a structured multimodal reasoning pipeline. A novel six-dimensional fine-grained evaluation taxonomy—equipped with an internal self-assessment mechanism—is introduced for classification. Crucially, the framework incorporates a dual-dimension stability assessment based on prediction confidence and inter-prediction variability, markedly enhancing interpretability and robustness. Extensive experiments across five heterogeneous benchmarks—GossipCop, Politifact, Fakeddit, MMBFakeBench, and AMMEBA—demonstrate superior cross-domain detection accuracy and result consistency over baseline methods. Moreover, the analysis explicitly reveals current large multimodal models’ limitations in semantic contradiction identification and context-dependent reasoning. The framework provides a reproducible, principled methodology for trustworthy multimodal AI.

Technology Category

Application Category

📝 Abstract
The proliferation of disinformation, particularly in multimodal contexts combining text and images, presents a significant challenge across digital platforms. This study investigates the potential of large multimodal models (LMMs) in detecting and mitigating false information. We propose to approach multimodal disinformation detection by leveraging the advanced capabilities of the GPT-4o model. Our contributions include: (1) the development of an optimized prompt incorporating advanced prompt engineering techniques to ensure precise and consistent evaluations; (2) the implementation of a structured framework for multimodal analysis, including a preprocessing methodology for images and text to comply with the model's token limitations; (3) the definition of six specific evaluation criteria that enable a fine-grained classification of content, complemented by a self-assessment mechanism based on confidence levels; (4) a comprehensive performance analysis of the model across multiple heterogeneous datasets Gossipcop, Politifact, Fakeddit, MMFakeBench, and AMMEBA highlighting GPT-4o's strengths and limitations in disinformation detection; (5) an investigation of prediction variability through repeated testing, evaluating the stability and reliability of the model's classifications; and (6) the introduction of confidence-level and variability-based evaluation methods. These contributions provide a robust and reproducible methodological framework for automated multimodal disinformation analysis.
Problem

Research questions and friction points this paper is trying to address.

Detecting multimodal disinformation combining text and images
Evaluating GPT-4o model capabilities for false information detection
Developing robust framework for automated disinformation analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging GPT-4o model for multimodal disinformation detection
Implementing structured preprocessing for text and images
Defining six evaluation criteria with self-assessment mechanism
🔎 Similar Papers
No similar papers found.
Y
Yasmina Kheddache
DĂ©partement d’informatique et recherche opĂ©rationnelle (D.I.R.O.) , UniversitĂ© de MontrĂ©al , Pavillon AndrĂ©-Aisenstadt , 2920, chemin de la Tour , MontrĂ©al (QC) H3T 1N8
Marc Lalonde
Marc Lalonde
CRIM
Computer vision