🤖 AI Summary
To address weak modality-specific representation learning and the difficulty of explicitly uncovering semantic conflicts in multimodal fake news detection, this paper proposes a hierarchical unimodal modeling framework coupled with an inverse-attention-driven cross-modal inconsistency identification mechanism. We innovatively design an inverse attention mechanism to explicitly model semantic conflicts both intra-modally (e.g., within text or image) and inter-modally (between text and image). Additionally, a hierarchical graph neural network is introduced to jointly capture local–local and local–global dependencies. Evaluated on multiple benchmark datasets, our method consistently outperforms state-of-the-art approaches, achieving an average accuracy improvement of 3.2% and enhanced robustness. The framework provides an interpretable, highly discriminative paradigm for fake news detection on social media, enabling transparent reasoning about modality-wise inconsistencies.
📝 Abstract
Multimodal fake news detection has garnered significant attention due to its profound implications for social security. While existing approaches have contributed to understanding cross-modal consistency, they often fail to leverage modal-specific representations and explicit discrepant features. To address these limitations, we propose a Multimodal Inverse Attention Network (MIAN), a novel framework that explores intrinsic discriminative features based on news content to advance fake news detection. Specifically, MIAN introduces a hierarchical learning module that captures diverse intra-modal relationships through local-to-global and local-to-local interactions, thereby generating enhanced unimodal representations to improve the identification of fake news at the intra-modal level. Additionally, a cross-modal interaction module employs a co-attention mechanism to establish and model dependencies between the refined unimodal representations, facilitating seamless semantic integration across modalities. To explicitly extract inconsistency features, we propose an inverse attention mechanism that effectively highlights the conflicting patterns and semantic deviations introduced by fake news in both intra- and inter-modality. Extensive experiments on benchmark datasets demonstrate that MIAN significantly outperforms state-of-the-art methods, underscoring its pivotal contribution to advancing social security through enhanced multimodal fake news detection.