DeepfakeBench-MM: A Comprehensive Benchmark for Multimodal Deepfake Detection

📅 2025-10-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current multimodal deepfake detection is hindered by insufficient training data diversity and the absence of standardized evaluation benchmarks. To address these challenges, we introduce Mega-MMDF—the largest publicly available multimodal deepfake dataset to date—comprising 10,000 real and 1.1 million synthetic samples spanning 10 audio forgery, 12 visual forgery, and 6 audio-driven facial reenactment techniques. We further propose DeepfakeBench-MM, the first end-to-end unified benchmark for multimodal deepfake detection, enabling standardized evaluation across five datasets and eleven models. Crucially, it supports cross-modal collaborative analysis and composite (layered) forgery modeling—revealing key factors influencing multimodal detection performance. Our work provides the community with a high-diversity data foundation, a reproducible evaluation framework, and systematic analytical tools, thereby advancing standardization and in-depth development of multimodal deepfake detection.

Technology Category

Application Category

📝 Abstract
The misuse of advanced generative AI models has resulted in the widespread proliferation of falsified data, particularly forged human-centric audiovisual content, which poses substantial societal risks (e.g., financial fraud and social instability). In response to this growing threat, several works have preliminarily explored countermeasures. However, the lack of sufficient and diverse training data, along with the absence of a standardized benchmark, hinder deeper exploration. To address this challenge, we first build Mega-MMDF, a large-scale, diverse, and high-quality dataset for multimodal deepfake detection. Specifically, we employ 21 forgery pipelines through the combination of 10 audio forgery methods, 12 visual forgery methods, and 6 audio-driven face reenactment methods. Mega-MMDF currently contains 0.1 million real samples and 1.1 million forged samples, making it one of the largest and most diverse multimodal deepfake datasets, with plans for continuous expansion. Building on it, we present DeepfakeBench-MM, the first unified benchmark for multimodal deepfake detection. It establishes standardized protocols across the entire detection pipeline and serves as a versatile platform for evaluating existing methods as well as exploring novel approaches. DeepfakeBench-MM currently supports 5 datasets and 11 multimodal deepfake detectors. Furthermore, our comprehensive evaluations and in-depth analyses uncover several key findings from multiple perspectives (e.g., augmentation, stacked forgery). We believe that DeepfakeBench-MM, together with our large-scale Mega-MMDF, will serve as foundational infrastructures for advancing multimodal deepfake detection.
Problem

Research questions and friction points this paper is trying to address.

Addressing the lack of sufficient diverse training data for multimodal deepfake detection
Establishing a standardized benchmark for evaluating multimodal deepfake detection methods
Providing comprehensive evaluation of deepfake detectors across multiple forgery techniques
Innovation

Methods, ideas, or system contributions that make the work stand out.

Built large-scale multimodal deepfake dataset Mega-MMDF
Established unified benchmark DeepfakeBench-MM for detection
Employed 21 forgery pipelines combining audio and visual methods
🔎 Similar Papers
No similar papers found.
K
Kangran Zhao
The Chinese University of Hong Kong, Shenzhen
Y
Yupeng Chen
The Chinese University of Hong Kong, Shenzhen
X
Xiaoyu Zhang
The Chinese University of Hong Kong, Shenzhen
Yize Chen
Yize Chen
Assistant Professor, University of Alberta
Machine LearningPower SystemsOptimizationControl
W
Weinan Guan
The Chinese University of Hong Kong, Shenzhen
Baicheng Chen
Baicheng Chen
University of California San Diego
MetasurfaceMetamaterialWireless SensingMobile HealthSecurity/Privacy
C
Chengzhe Sun
University at Buffalo, State University of New York
S
Soumyya Kanti Datta
University at Buffalo, State University of New York
Qingshan Liu
Qingshan Liu
Nanjing University of Posts and Telecommunications
Image and Video AnalysisComputer VisionPattern Recognition
S
Siwei Lyu
University at Buffalo, State University of New York
Baoyuan Wu
Baoyuan Wu
Associate Professor, CUHK-SZ
AI Security and PrivacyMachine LearningComputer VisionOptimization