🤖 AI Summary
To address three key challenges in multimodal fake news detection—insufficient image semantic understanding, limited textual information, and neglect of emotional category disparities—this paper proposes the Knowledge-Enhanced and Emotion-Guided Network (KEN). KEN innovatively integrates image descriptions generated by a vision-language model (VLM) with external evidence retrieval to strengthen cross-modal semantic alignment. It further introduces an emotion-category-aware balanced learning mechanism to enable fine-grained veracity classification. By jointly optimizing knowledge enhancement and emotion-guided modules, KEN significantly improves discrimination capability—even under text-scarce conditions. Extensive experiments on two real-world benchmark datasets demonstrate that KEN consistently outperforms state-of-the-art methods across multiple metrics, validating its effectiveness, generalizability, and robustness in multimodal fake news detection.
📝 Abstract
In recent years, the rampant spread of misinformation on social media has made accurate detection of multimodal fake news a critical research focus. However, previous research has not adequately understood the semantics of images, and models struggle to discern news authenticity with limited textual information. Meanwhile, treating all emotional types of news uniformly without tailored approaches further leads to performance degradation. Therefore, we propose a novel Knowledge Augmentation and Emotion Guidance Network (KEN). On the one hand, we effectively leverage LVLM's powerful semantic understanding and extensive world knowledge. For images, the generated captions provide a comprehensive understanding of image content and scenes, while for text, the retrieved evidence helps break the information silos caused by the closed and limited text and context. On the other hand, we consider inter-class differences between different emotional types of news through balanced learning, achieving fine-grained modeling of the relationship between emotional types and authenticity. Extensive experiments on two real-world datasets demonstrate the superiority of our KEN.