Effectiveness of Large Multimodal Models in Detecting Disinformation: Experimental Results

📅 2025-09-26

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

This study addresses the challenge of multimodal fake news detection from image-text pairs. Methodologically, it proposes a systematic analysis framework powered by GPT-4o, integrating prompt engineering optimization, joint image-text preprocessing, and a structured multimodal reasoning pipeline. A novel six-dimensional fine-grained evaluation taxonomy—equipped with an internal self-assessment mechanism—is introduced for classification. Crucially, the framework incorporates a dual-dimension stability assessment based on prediction confidence and inter-prediction variability, markedly enhancing interpretability and robustness. Extensive experiments across five heterogeneous benchmarks—GossipCop, Politifact, Fakeddit, MMBFakeBench, and AMMEBA—demonstrate superior cross-domain detection accuracy and result consistency over baseline methods. Moreover, the analysis explicitly reveals current large multimodal models’ limitations in semantic contradiction identification and context-dependent reasoning. The framework provides a reproducible, principled methodology for trustworthy multimodal AI.

Technology Category

Application Category

📝 Abstract

The proliferation of disinformation, particularly in multimodal contexts combining text and images, presents a significant challenge across digital platforms. This study investigates the potential of large multimodal models (LMMs) in detecting and mitigating false information. We propose to approach multimodal disinformation detection by leveraging the advanced capabilities of the GPT-4o model. Our contributions include: (1) the development of an optimized prompt incorporating advanced prompt engineering techniques to ensure precise and consistent evaluations; (2) the implementation of a structured framework for multimodal analysis, including a preprocessing methodology for images and text to comply with the model's token limitations; (3) the definition of six specific evaluation criteria that enable a fine-grained classification of content, complemented by a self-assessment mechanism based on confidence levels; (4) a comprehensive performance analysis of the model across multiple heterogeneous datasets Gossipcop, Politifact, Fakeddit, MMFakeBench, and AMMEBA highlighting GPT-4o's strengths and limitations in disinformation detection; (5) an investigation of prediction variability through repeated testing, evaluating the stability and reliability of the model's classifications; and (6) the introduction of confidence-level and variability-based evaluation methods. These contributions provide a robust and reproducible methodological framework for automated multimodal disinformation analysis.

Problem

Research questions and friction points this paper is trying to address.

Detecting multimodal disinformation combining text and images

Evaluating GPT-4o model capabilities for false information detection

Developing robust framework for automated disinformation analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging GPT-4o model for multimodal disinformation detection

Implementing structured preprocessing for text and images

Defining six evaluation criteria with self-assessment mechanism

🔎 Similar Papers

No similar papers found.