MGFFD-VLM: Multi-Granularity Prompt Learning for Face Forgery Detection with VLM

📅 2025-07-16

📈 Citations: 0

✨ Influential: 0

career value

260K/year

🤖 AI Summary

Current vision-language models (VLMs) for face forgery detection can classify authenticity and generate explanations but overlook anomalous patterns in facial quality attributes and lack forgery-aware training mechanisms. To address this, we propose a forgery-perception-oriented multi-granularity prompting framework. First, we construct DD-VQA+, a new dataset annotated with fine-grained facial quality attributes. Second, we design an attribute-driven hybrid LoRA fine-tuning strategy to enhance forgery sensitivity. Third, we introduce multiple auxiliary losses explicitly aligned with forgery characteristics. Finally, we establish an end-to-end mapping from classification/segmentation outputs to interpretable textual prompts. Experiments demonstrate that our method significantly outperforms state-of-the-art approaches on text-guided forgery discrimination and attribution tasks, achieving simultaneous improvements in detection accuracy and explanation plausibility.

Technology Category

Application Category

📝 Abstract

Recent studies have utilized visual large language models (VLMs) to answer not only "Is this face a forgery?" but also "Why is the face a forgery?" These studies introduced forgery-related attributes, such as forgery location and type, to construct deepfake VQA datasets and train VLMs, achieving high accuracy while providing human-understandable explanatory text descriptions. However, these methods still have limitations. For example, they do not fully leverage face quality-related attributes, which are often abnormal in forged faces, and they lack effective training strategies for forgery-aware VLMs. In this paper, we extend the VQA dataset to create DD-VQA+, which features a richer set of attributes and a more diverse range of samples. Furthermore, we introduce a novel forgery detection framework, MGFFD-VLM, which integrates an Attribute-Driven Hybrid LoRA Strategy to enhance the capabilities of Visual Large Language Models (VLMs). Additionally, our framework incorporates Multi-Granularity Prompt Learning and a Forgery-Aware Training Strategy. By transforming classification and forgery segmentation results into prompts, our method not only improves forgery classification but also enhances interpretability. To further boost detection performance, we design multiple forgery-related auxiliary losses. Experimental results demonstrate that our approach surpasses existing methods in both text-based forgery judgment and analysis, achieving superior accuracy.

Problem

Research questions and friction points this paper is trying to address.

Enhancing face forgery detection using multi-granularity prompts

Improving interpretability of forgery detection with VLMs

Addressing limitations in current forgery-aware VLM training

Innovation

Methods, ideas, or system contributions that make the work stand out.

Attribute-Driven Hybrid LoRA Strategy

Multi-Granularity Prompt Learning

Forgery-Aware Training Strategy

🔎 Similar Papers

Standing on the Shoulders of Giants: Reprogramming Visual-Language Model for General Deepfake Detection