Training-Free Multimodal Deepfake Detection via Graph Reasoning

📅 2025-09-25

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Multimodal deepfake detection (MDD) faces three key challenges: (1) difficulty in capturing subtle forgery cues, (2) inability to identify cross-modal inconsistencies, and (3) poor task alignment during retrieval. To address these, we propose GASP-ICL—a training-free framework leveraging graph-structured adaptive scoring and in-context learning. Its core innovation is a novel Graph-structured Adaptive Scoring via Taylor expansion (GASP), which models inter-sample relationships and propagates query-aligned signals for semantically coherent, task-directed exemplar selection. Integrated with an MDD-specific feature extractor, graph-structured reasoning, and in-context learning, GASP-ICL injects task-aware knowledge into large vision-language models. Evaluated across four diverse forgery scenarios, GASP-ICL significantly outperforms strong baselines without any fine-tuning, achieving robust and generalizable multimodal deepfake detection performance.

Technology Category

Application Category

📝 Abstract

Multimodal deepfake detection (MDD) aims to uncover manipulations across visual, textual, and auditory modalities, thereby reinforcing the reliability of modern information systems. Although large vision-language models (LVLMs) exhibit strong multimodal reasoning, their effectiveness in MDD is limited by challenges in capturing subtle forgery cues, resolving cross-modal inconsistencies, and performing task-aligned retrieval. To this end, we propose Guided Adaptive Scorer and Propagation In-Context Learning (GASP-ICL), a training-free framework for MDD. GASP-ICL employs a pipeline to preserve semantic relevance while injecting task-aware knowledge into LVLMs. We leverage an MDD-adapted feature extractor to retrieve aligned image-text pairs and build a candidate set. We further design the Graph-Structured Taylor Adaptive Scorer (GSTAS) to capture cross-sample relations and propagate query-aligned signals, producing discriminative exemplars. This enables precise selection of semantically aligned, task-relevant demonstrations, enhancing LVLMs for robust MDD. Experiments on four forgery types show that GASP-ICL surpasses strong baselines, delivering gains without LVLM fine-tuning.

Problem

Research questions and friction points this paper is trying to address.

Detecting multimodal deepfakes without training requirements

Addressing cross-modal inconsistencies in forgery detection

Enhancing LVLM reasoning for subtle manipulation cues

Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free framework for multimodal deepfake detection

Graph-structured scoring to capture cross-sample relations

Adaptive propagation of query-aligned signals for exemplars

🔎 Similar Papers

LatentForensics: Towards frugal deepfake detection in the StyleGAN latent space