From Prediction to Explanation: Multimodal, Explainable, and Interactive Deepfake Detection Framework for Non-Expert Users

📅 2025-08-10

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

Current deepfake detection models are predominantly black-box systems with poor interpretability, hindering trustworthy decision-making by non-expert users. To address this, we propose DF-P2E—a novel framework that integrates Grad-CAM-based saliency visualization, vision-to-language description generation, and large language model (LLM) fine-tuning for narrative optimization—thereby establishing the first multimodal, interactive explanation system tailored for lay users. DF-P2E delivers hierarchical, context-aware, and user-sensitive explanations across three levels: visual (attention maps), semantic (image-grounded captions), and narrative (coherent, natural-language reasoning). It enables seamless transition from model prediction to human-understandable inference. Evaluated on the DF40 benchmark, DF-P2E achieves detection accuracy competitive with state-of-the-art methods while generating natural language explanations tightly aligned with visual evidence. This significantly enhances model transparency, user comprehension, and human-AI collaborative decision-making capability.

Technology Category

Application Category

📝 Abstract

The proliferation of deepfake technologies poses urgent challenges and serious risks to digital integrity, particularly within critical sectors such as forensics, journalism, and the legal system. While existing detection systems have made significant progress in classification accuracy, they typically function as black-box models, offering limited transparency and minimal support for human reasoning. This lack of interpretability hinders their usability in real-world decision-making contexts, especially for non-expert users. In this paper, we present DF-P2E (Deepfake: Prediction to Explanation), a novel multimodal framework that integrates visual, semantic, and narrative layers of explanation to make deepfake detection interpretable and accessible. The framework consists of three modular components: (1) a deepfake classifier with Grad-CAM-based saliency visualisation, (2) a visual captioning module that generates natural language summaries of manipulated regions, and (3) a narrative refinement module that uses a fine-tuned Large Language Model (LLM) to produce context-aware, user-sensitive explanations. We instantiate and evaluate the framework on the DF40 benchmark, the most diverse deepfake dataset to date. Experiments demonstrate that our system achieves competitive detection performance while providing high-quality explanations aligned with Grad-CAM activations. By unifying prediction and explanation in a coherent, human-aligned pipeline, this work offers a scalable approach to interpretable deepfake detection, advancing the broader vision of trustworthy and transparent AI systems in adversarial media environments.

Problem

Research questions and friction points this paper is trying to address.

Enhances deepfake detection interpretability for non-experts

Combines visual, semantic, and narrative explanation layers

Addresses black-box limitations in existing detection systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal framework integrates visual, semantic, narrative layers

Grad-CAM-based saliency visualization for deepfake classifier

Fine-tuned LLM for context-aware, user-sensitive explanations

🔎 Similar Papers

FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models