Training-Free Multimodal Deepfake Detection via Graph Reasoning

πŸ“… 2025-09-25
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Multimodal deepfake detection (MDD) faces three key challenges: (1) difficulty in capturing subtle forgery cues, (2) inability to identify cross-modal inconsistencies, and (3) poor task alignment during retrieval. To address these, we propose GASP-ICLβ€”a training-free framework leveraging graph-structured adaptive scoring and in-context learning. Its core innovation is a novel Graph-structured Adaptive Scoring via Taylor expansion (GASP), which models inter-sample relationships and propagates query-aligned signals for semantically coherent, task-directed exemplar selection. Integrated with an MDD-specific feature extractor, graph-structured reasoning, and in-context learning, GASP-ICL injects task-aware knowledge into large vision-language models. Evaluated across four diverse forgery scenarios, GASP-ICL significantly outperforms strong baselines without any fine-tuning, achieving robust and generalizable multimodal deepfake detection performance.

Technology Category

Application Category

πŸ“ Abstract
Multimodal deepfake detection (MDD) aims to uncover manipulations across visual, textual, and auditory modalities, thereby reinforcing the reliability of modern information systems. Although large vision-language models (LVLMs) exhibit strong multimodal reasoning, their effectiveness in MDD is limited by challenges in capturing subtle forgery cues, resolving cross-modal inconsistencies, and performing task-aligned retrieval. To this end, we propose Guided Adaptive Scorer and Propagation In-Context Learning (GASP-ICL), a training-free framework for MDD. GASP-ICL employs a pipeline to preserve semantic relevance while injecting task-aware knowledge into LVLMs. We leverage an MDD-adapted feature extractor to retrieve aligned image-text pairs and build a candidate set. We further design the Graph-Structured Taylor Adaptive Scorer (GSTAS) to capture cross-sample relations and propagate query-aligned signals, producing discriminative exemplars. This enables precise selection of semantically aligned, task-relevant demonstrations, enhancing LVLMs for robust MDD. Experiments on four forgery types show that GASP-ICL surpasses strong baselines, delivering gains without LVLM fine-tuning.
Problem

Research questions and friction points this paper is trying to address.

Detecting multimodal deepfakes without training requirements
Addressing cross-modal inconsistencies in forgery detection
Enhancing LVLM reasoning for subtle manipulation cues
Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free framework for multimodal deepfake detection
Graph-structured scoring to capture cross-sample relations
Adaptive propagation of query-aligned signals for exemplars
Y
Yuxin Liu
School of Internet, Anhui University, Hefei, China
F
Fei Wang
School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China
K
Kun Li
Department of Computer Science, Hong Kong Baptist University, Hong Kong, China
Y
Yiqi Nie
Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China
J
Junjie Chen
Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China
Yanyan Wei
Yanyan Wei
Hefei University of Technology (HFUT)
Robust Image PerceptionLLMAI Agent
Z
Zhangling Duan
Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China
Z
Zhaohong Jia
School of Internet, Anhui University, Hefei, China