Retrieval Augmented Verification for Zero-Shot Detection of Multimodal Disinformation

📅 Unknown Date
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Multimodal disinformation—such as manipulated images paired with misleading text—poses significant challenges to conventional fact-checking on social media due to its cross-modal semantic inconsistency. Method: We propose the first retrieval-augmented, zero-shot verification framework for multimodal misinformation. It constructs a claim-driven graph structure to model entities and relations, integrates CLIP-based visual features with knowledge graphs, and performs real-time cross-modal retrieval of trustworthy external evidence to detect image-text semantic misalignment. The framework supports fine-grained, interpretable, element-wise verification, explicitly labeling credible or suspicious image/text segments. Contribution/Results: This work establishes the first zero-shot multimodal verification paradigm, achieves state-of-the-art performance on mainstream benchmarks, and delivers highly transparent, traceable verification reports with explicit evidence grounding.

Technology Category

Application Category

📝 Abstract
The rise of disinformation on social media, especially through the strategic manipulation or repurposing of images, paired with provocative text, presents a complex challenge for traditional fact-checking methods. In this paper, we introduce a novel zero-shot approach to identify and interpret such multimodal disinformation, leveraging real-time evidence from credible sources. Our framework goes beyond simple true-or-false classifications by analyzing both the textual and visual components of social media claims in a structured, interpretable manner. By constructing a graph-based representation of entities and relationships within the claim, combined with pretrained visual features, our system automatically retrieves and matches external evidence to identify inconsistencies. Unlike traditional models dependent on labeled datasets, our method empowers users with transparency, illuminating exactly which aspects of the claim hold up to scrutiny and which do not. Our framework achieves competitive performance with state-of-the-art methods while offering enhanced explainability.
Problem

Research questions and friction points this paper is trying to address.

Detect multimodal disinformation using zero-shot methods
Analyze text and visuals for inconsistencies with credible evidence
Provide transparent, explainable verification without labeled datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Zero-shot multimodal disinformation detection using real-time evidence
Graph-based entity and relationship analysis for inconsistency identification
Pretrained visual features combined with textual analysis for transparency
🔎 Similar Papers
No similar papers found.