Enhancing Multimodal Misinformation Detection by Replaying the Whole Story from Image Modality Perspective

📅 2025-11-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In multimodal fake news detection, image modalities often exhibit weak representation and low informativeness, as images typically depict only local event aspects while text describes the event holistically. Method: We propose a text-segmentation-driven image sequence augmentation framework: input text is decomposed into semantically coherent segments; a pretrained text-to-image generative model synthesizes corresponding image sequences; and an image relation graph is constructed therefrom. We further introduce a joint optimization objective based on text–image mutual information and image–label mutual information to enhance cross-modal consistency and discriminability. Contribution/Results: Our method requires no additional human annotations and effectively compensates for image modality information deficiency. Extensive experiments on multiple benchmark datasets demonstrate significant improvements over state-of-the-art methods, validating the effectiveness and generalizability of event-level image sequence modeling and graph-structured mutual information learning.

Technology Category

Application Category

📝 Abstract
Multimodal Misinformation Detection (MMD) refers to the task of detecting social media posts involving misinformation, where the post often contains text and image modalities. However, by observing the MMD posts, we hold that the text modality may be much more informative than the image modality because the text generally describes the whole event/story of the current post but the image often presents partial scenes only. Our preliminary empirical results indicate that the image modality exactly contributes less to MMD. Upon this idea, we propose a new MMD method named RETSIMD. Specifically, we suppose that each text can be divided into several segments, and each text segment describes a partial scene that can be presented by an image. Accordingly, we split the text into a sequence of segments, and feed these segments into a pre-trained text-to-image generator to augment a sequence of images. We further incorporate two auxiliary objectives concerning text-image and image-label mutual information, and further post-train the generator over an auxiliary text-to-image generation benchmark dataset. Additionally, we propose a graph structure by defining three heuristic relationships between images, and use a graph neural network to generate the fused features. Extensive empirical results validate the effectiveness of RETSIMD.
Problem

Research questions and friction points this paper is trying to address.

Detecting misinformation in multimodal social media posts
Addressing image modality's limited contribution to detection
Augmenting partial image scenes with generated full-story images
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generating images from text segments using pretrained generator
Augmenting images with text-image and image-label mutual information
Fusing image features via graph neural network relationships
🔎 Similar Papers
No similar papers found.
B
Bing Wang
College of Computer Science and Technology, Jilin University, Changchun, Jilin, China; Key Laboratory of Symbolic Computation and Knowledge Engineering of the MoE, Jilin University, Changchun, Jilin, China
Ximing Li
Ximing Li
Jilin university, China; RIKEN AIP, Japan
Weakly-supervised learningMisinformation analysis
Y
Yanjun Wang
Key Laboratory of Symbolic Computation and Knowledge Engineering of the MoE, Jilin University, Changchun, Jilin, China; College of Software, Jilin University, China
Changchun Li
Changchun Li
Jilin University
Text ClassificationTopic ModelingWeakly Supervised LearningPartial Label LearningSemi-supervised Learning
Lin Yuanbo Wu
Lin Yuanbo Wu
Swansea University
Computer VisionAI GenerationTrustworthy AIAutonomous SystemEmbodied Visual Intelligence
B
Buyu Wang
College of Computer and Information Engineering, Inner Mongolia Agricultural University, Hohhot, Inner Mongolia, China
S
Shengsheng Wang
College of Computer Science and Technology, Jilin University, Changchun, Jilin, China; Key Laboratory of Symbolic Computation and Knowledge Engineering of the MoE, Jilin University, Changchun, Jilin, China