🤖 AI Summary
Existing approaches to multimodal fake news detection struggle to adequately model the complex semantic relationships between images and text. To address this limitation, this work proposes MViR, a novel framework that introduces, for the first time, a multi-view vision-language modeling paradigm. MViR captures semantic variations of the same image under different textual contexts through a multi-view representation module, extracts multi-granularity image-text alignment features using pyramid dilated convolutions, and integrates diverse semantic cues via a multi-aggregation mechanism. Extensive experiments demonstrate that MViR significantly outperforms state-of-the-art methods across multiple benchmark datasets, confirming its effectiveness and superiority in multimodal fake news detection.
📝 Abstract
With the rise of online social networks, detecting fake news accurately is essential for a healthy online environment. While existing methods have advanced multimodal fake news detection, they often neglect the multi-view visual-semantic aspects of news, such as different text perspectives of the same image. To address this, we propose a Multi-View Visual-Semantic Representation (MViR) framework. Our approach includes a Multi-View Representation module using pyramid dilated convolution to capture multi-view visual-semantic features, a Multi-View Feature Fusion module to integrate these features with text, and multiple aggregators to extract multi-view semantic cues for detection. Experiments on benchmark datasets demonstrate the superiority of MViR. The source code of FedCoop is available at https://github.com/FlowerinZDF/FakeNews-MVIR.