DF-LLaVA: Unlocking MLLM's potential for Synthetic Image Detection via Prompt-Guided Knowledge Injection

📅 2025-09-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing synthetic image detection methods struggle to simultaneously achieve high classification accuracy and human-interpretable forgery localization. Method: We propose DF-LLaVA, the first framework to explicitly inject discriminative knowledge—implicitly encoded in multimodal large language models (MLLMs)—into training via a prompt-guided knowledge injection mechanism. Built upon the LLaVA architecture, DF-LLaVA integrates prompt engineering with knowledge distillation to jointly perform fine-grained localization and generate natural-language explanations. Contribution/Results: On multiple benchmarks, DF-LLaVA surpasses state-of-the-art expert models in detection accuracy while producing human-readable reasoning texts. It is the first method to unify high precision and strong interpretability without compromising either, establishing a new paradigm for trustworthy AI-based content authentication.

Technology Category

Application Category

📝 Abstract
With the increasing prevalence of synthetic images, evaluating image authenticity and locating forgeries accurately while maintaining human interpretability remains a challenging task. Existing detection models primarily focus on simple authenticity classification, ultimately providing only a forgery probability or binary judgment, which offers limited explanatory insights into image authenticity. Moreover, while MLLM-based detection methods can provide more interpretable results, they still lag behind expert models in terms of pure authenticity classification accuracy. To address this, we propose DF-LLaVA, a simple yet effective framework that unlocks the intrinsic discrimination potential of MLLMs. Our approach first extracts latent knowledge from MLLMs and then injects it into training via prompts. This framework allows LLaVA to achieve outstanding detection accuracy exceeding expert models while still maintaining the interpretability offered by MLLMs. Extensive experiments confirm the superiority of our DF-LLaVA, achieving both high accuracy and explainability in synthetic image detection. Code is available online at: https://github.com/Eliot-Shen/DF-LLaVA.
Problem

Research questions and friction points this paper is trying to address.

Detecting synthetic images accurately with interpretable results
Improving MLLM detection accuracy to surpass expert models
Injecting prompt-guided knowledge to enhance discrimination capabilities
Innovation

Methods, ideas, or system contributions that make the work stand out.

Prompt-guided knowledge injection from MLLMs
Exceeding expert models' detection accuracy
Maintaining interpretability while improving performance
🔎 Similar Papers
No similar papers found.
Z
Zhuokang Shen
K
Kaisen Zhang
Bohan Jia
Bohan Jia
East China Normal University
MLLMLLMAIGC
Y
Yuan Fang
Z
Zhou Yu
S
Shaohui Lin