Deepfake Forensics Adapter: A Dual-Stream Network for Generalizable Deepfake Detection

📅 2026-03-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited generalization of current deepfake detection methods in the face of rapidly evolving forgery techniques. The authors propose a dual-stream network framework that, while keeping the parameters of a pretrained CLIP model frozen, effectively fuses its global semantic features with local anomaly cues—augmented by facial structural priors—through a Transformer-based interactive fusion mechanism to enhance detection of unseen forgery types. Evaluated on the DFDC dataset, the method achieves a frame-level AUC of 0.816 and a video-level AUC of 0.836, representing a 4.8% improvement in video-level AUC over the previous state-of-the-art, thereby significantly advancing cross-forgery-type generalization performance.

Technology Category

Application Category

📝 Abstract
The rapid advancement of deepfake generation techniques poses significant threats to public safety and causes societal harm through the creation of highly realistic synthetic facial media. While existing detection methods demonstrate limitations in generalizing to emerging forgery patterns, this paper presents Deepfake Forensics Adapter (DFA), a novel dual-stream framework that synergizes vision-language foundation models with targeted forensics analysis. Our approach integrates a pre-trained CLIP model with three core components to achieve specialized deepfake detection by leveraging the powerful general capabilities of CLIP without changing CLIP parameters: 1) A Global Feature Adapter is used to identify global inconsistencies in image content that may indicate forgery, 2) A Local Anomaly Stream enhances the model's ability to perceive local facial forgery cues by explicitly leveraging facial structure priors, and 3) An Interactive Fusion Classifier promotes deep interaction and fusion between global and local features using a transformer encoder. Extensive evaluations of frame-level and video-level benchmarks demonstrate the superior generalization capabilities of DFA, particularly achieving state-of-the-art performance in the challenging DFDC dataset with frame-level AUC/EER of 0.816/0.256 and video-level AUC/EER of 0.836/0.251, representing a 4.8% video AUC improvement over previous methods. Our framework not only demonstrates state-of-the-art performance, but also points out a feasible and effective direction for developing a robust deepfake detection system with enhanced generalization capabilities against the evolving deepfake threats. Our code is available at https://github.com/Liao330/DFA.git
Problem

Research questions and friction points this paper is trying to address.

deepfake detection
generalization
forgery patterns
synthetic media
forensics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Deepfake Detection
Vision-Language Model
Adapter Architecture
Generalization
Dual-Stream Network
🔎 Similar Papers
No similar papers found.
J
Jianfeng Liao
Shenzhen Technology University, Guangdong, China
Yichen Wei
Yichen Wei
SHUKUN Technology
deep learningcomputer visionmedical image analysis
R
Raymond Chan Ching Bon
Singapore Institute of Technology, Singapore, Singapore
S
Shulan Wang
Shenzhen Technology University, Guangdong, China
K
Kam-Pui Chow
The University of Hong Kong, Hong Kong, China
Kwok-Yan Lam
Kwok-Yan Lam
Nanyang Technological University
CybersecurityPrivacy-Preserving technologiesDigital TrustDistributing systemsLegalTech