MGFFD-VLM: Multi-Granularity Prompt Learning for Face Forgery Detection with VLM

📅 2025-07-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current vision-language models (VLMs) for face forgery detection can classify authenticity and generate explanations but overlook anomalous patterns in facial quality attributes and lack forgery-aware training mechanisms. To address this, we propose a forgery-perception-oriented multi-granularity prompting framework. First, we construct DD-VQA+, a new dataset annotated with fine-grained facial quality attributes. Second, we design an attribute-driven hybrid LoRA fine-tuning strategy to enhance forgery sensitivity. Third, we introduce multiple auxiliary losses explicitly aligned with forgery characteristics. Finally, we establish an end-to-end mapping from classification/segmentation outputs to interpretable textual prompts. Experiments demonstrate that our method significantly outperforms state-of-the-art approaches on text-guided forgery discrimination and attribution tasks, achieving simultaneous improvements in detection accuracy and explanation plausibility.

Technology Category

Application Category

📝 Abstract
Recent studies have utilized visual large language models (VLMs) to answer not only "Is this face a forgery?" but also "Why is the face a forgery?" These studies introduced forgery-related attributes, such as forgery location and type, to construct deepfake VQA datasets and train VLMs, achieving high accuracy while providing human-understandable explanatory text descriptions. However, these methods still have limitations. For example, they do not fully leverage face quality-related attributes, which are often abnormal in forged faces, and they lack effective training strategies for forgery-aware VLMs. In this paper, we extend the VQA dataset to create DD-VQA+, which features a richer set of attributes and a more diverse range of samples. Furthermore, we introduce a novel forgery detection framework, MGFFD-VLM, which integrates an Attribute-Driven Hybrid LoRA Strategy to enhance the capabilities of Visual Large Language Models (VLMs). Additionally, our framework incorporates Multi-Granularity Prompt Learning and a Forgery-Aware Training Strategy. By transforming classification and forgery segmentation results into prompts, our method not only improves forgery classification but also enhances interpretability. To further boost detection performance, we design multiple forgery-related auxiliary losses. Experimental results demonstrate that our approach surpasses existing methods in both text-based forgery judgment and analysis, achieving superior accuracy.
Problem

Research questions and friction points this paper is trying to address.

Enhancing face forgery detection using multi-granularity prompts
Improving interpretability of forgery detection with VLMs
Addressing limitations in current forgery-aware VLM training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Attribute-Driven Hybrid LoRA Strategy
Multi-Granularity Prompt Learning
Forgery-Aware Training Strategy
🔎 Similar Papers
2024-09-04Proceedings of the AAAI Conference on Artificial IntelligenceCitations: 2