π€ AI Summary
Multimodal deepfake detection (MDD) faces three key challenges: (1) difficulty in capturing subtle forgery cues, (2) inability to identify cross-modal inconsistencies, and (3) poor task alignment during retrieval. To address these, we propose GASP-ICLβa training-free framework leveraging graph-structured adaptive scoring and in-context learning. Its core innovation is a novel Graph-structured Adaptive Scoring via Taylor expansion (GASP), which models inter-sample relationships and propagates query-aligned signals for semantically coherent, task-directed exemplar selection. Integrated with an MDD-specific feature extractor, graph-structured reasoning, and in-context learning, GASP-ICL injects task-aware knowledge into large vision-language models. Evaluated across four diverse forgery scenarios, GASP-ICL significantly outperforms strong baselines without any fine-tuning, achieving robust and generalizable multimodal deepfake detection performance.
π Abstract
Multimodal deepfake detection (MDD) aims to uncover manipulations across visual, textual, and auditory modalities, thereby reinforcing the reliability of modern information systems. Although large vision-language models (LVLMs) exhibit strong multimodal reasoning, their effectiveness in MDD is limited by challenges in capturing subtle forgery cues, resolving cross-modal inconsistencies, and performing task-aligned retrieval. To this end, we propose Guided Adaptive Scorer and Propagation In-Context Learning (GASP-ICL), a training-free framework for MDD. GASP-ICL employs a pipeline to preserve semantic relevance while injecting task-aware knowledge into LVLMs. We leverage an MDD-adapted feature extractor to retrieve aligned image-text pairs and build a candidate set. We further design the Graph-Structured Taylor Adaptive Scorer (GSTAS) to capture cross-sample relations and propagate query-aligned signals, producing discriminative exemplars. This enables precise selection of semantically aligned, task-relevant demonstrations, enhancing LVLMs for robust MDD. Experiments on four forgery types show that GASP-ICL surpasses strong baselines, delivering gains without LVLM fine-tuning.