RASR: Retrieval-Augmented Semantic Reasoning for Fake News Video Detection

📅 2026-04-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing methods in multimodal fake news video detection, which struggle to model global semantic relationships across samples and exhibit limited cross-domain generalization. To overcome these challenges, the authors propose the RASR framework, comprising a Cross-instance Semantic Parsing and Retrieval (CSPR) module, a Domain-Guided Multimodal Reasoning (DGMR) module, and a Multi-View Disentangled Feature Fusion (MVDFF) mechanism. By integrating external evidence, incorporating domain priors, and disentangling multimodal features, RASR enhances both semantic comprehension and cross-domain adaptability. Experimental results demonstrate that RASR significantly outperforms state-of-the-art approaches on the FakeSV and FakeTT datasets, achieving up to a 0.93% improvement in detection accuracy and exhibiting superior robustness and generalization capability.
📝 Abstract
Multimodal fake news video detection is a crucial research direction for maintaining the credibility of online information. Existing studies primarily verify content authenticity by constructing multimodal feature fusion representations or utilizing pre-trained language models to analyze video-text consistency. However, these methods still face the following limitations: (1) lacking cross-instance global semantic correlations, making it difficult to effectively utilize historical associative evidence to verify the current video; (2) semantic discrepancies across domains hinder the transfer of general knowledge, lacking the guidance of domain-specific expert knowledge. To this end, we propose a novel Retrieval-Augmented Semantic Reasoning (RASR) framework. First, a Cross-instance Semantic Parser and Retriever (CSPR) deconstructs the video into high-level semantic primitives and retrieves relevant associative evidence from a dynamic memory bank. Subsequently, a Domain-Guided Multimodal Reasoning (DGMP) module incorporates domain priors to drive an expert multimodal large language model in generating domain-aware, in-depth analysis reports. Finally, a Multi-View Feature Decoupling and Fusion (MVDFF) module integrates multi-dimensional features through an adaptive gating mechanism to achieve robust authenticity determination. Extensive experiments on the FakeSV and FakeTT datasets demonstrate that RASR significantly outperforms state-of-the-art baselines, achieves superior cross-domain generalization, and improves the overall detection accuracy by up to 0.93%.
Problem

Research questions and friction points this paper is trying to address.

fake news video detection
multimodal reasoning
cross-instance semantic correlation
domain-specific knowledge
semantic discrepancy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieval-Augmented Reasoning
Cross-instance Semantic Parsing
Domain-Guided Multimodal Reasoning
Multimodal Fake News Detection
Dynamic Memory Bank
🔎 Similar Papers
No similar papers found.
Hui Li
Hui Li
Xiamen University
Information RetrievalData MiningData Management
P
Peien Ding
School of Informatics, Xiamen University
J
Jun Li
School of Computer Science and Information Security, Guilin University of Electronic Technology
G
Guoqi Ma
School of Informatics, Xiamen University
Zhanyu Liu
Zhanyu Liu
Shanghai Jiao Tong University
Recommendation SystemLarge Language ModelData MiningTime Series Analysis
G
Ge Xu
School of Computer and Big Data, Minjiang University
J
Junfeng Yao
School of Film, School of Informatics, Institute of Artificial Intelligence, Xiamen Key Laboratory of Intelligent Storage and Computing, Xiamen University
Jinsong Su
Jinsong Su
Xiamen University
Natural Language ProcessingDeep LearningNeural Machine Translation