Deepfake-Eval-2024: A Multi-Modal In-the-Wild Benchmark of Deepfakes Circulated in 2024

📅 2025-03-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Rapid advancements in deepfake generation have outpaced mainstream academic detection benchmarks, leading to overly optimistic estimates of model generalization in real-world settings. Method: We introduce RealDeep24—the first 2024 real-world multimodal deepfake benchmark—comprising 44 hours of video, 56.5 hours of audio, and 1,975 images, collected from 88 websites and 52 languages across social and detection platforms. We propose an in-the-wild dynamic evaluation paradigm integrating cross-lingual/cross-platform provenance verification and an AUC-driven robustness assessment framework. Contribution/Results: Experiments reveal drastic performance drops for current SOTA detectors: AUC declines by 50% (video), 48% (audio), and 45% (image) relative to prior benchmarks. While commercial and fine-tuned models show notable improvements, they still underperform human experts. RealDeep24 and its evaluation toolkit are publicly released to foster realistic, rigorous deepfake detection research.

Technology Category

Application Category

📝 Abstract
In the age of increasingly realistic generative AI, robust deepfake detection is essential for mitigating fraud and disinformation. While many deepfake detectors report high accuracy on academic datasets, we show that these academic benchmarks are out of date and not representative of recent deepfakes. We introduce Deepfake-Eval-2024, a new deepfake detection benchmark consisting of in-the-wild deepfakes collected from social media and deepfake detection platform users in 2024. Deepfake-Eval-2024 consists of 44 hours of videos, 56.5 hours of audio, and 1,975 images, encompassing the latest manipulation technologies. The benchmark contains diverse media content from 88 different websites in 52 different languages. We find that the performance of open-source state-of-the-art deepfake detection models drops precipitously when evaluated on Deepfake-Eval-2024, with AUC decreasing by 50% for video, 48% for audio, and 45% for image models compared to previous benchmarks. We also evaluate commercial deepfake detection models and models finetuned on Deepfake-Eval-2024, and find that they have superior performance to off-the-shelf open-source models, but they do not yet reach the accuracy of human deepfake forensic analysts. The dataset is available at https://github.com/nuriachandra/Deepfake-Eval-2024.
Problem

Research questions and friction points this paper is trying to address.

Address outdated deepfake detection benchmarks with new realistic data.
Evaluate deepfake detection models on diverse, real-world media content.
Assess performance gap between AI models and human forensic analysts.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces Deepfake-Eval-2024 benchmark
Includes diverse multi-modal deepfake data
Evaluates state-of-the-art detection models
🔎 Similar Papers
No similar papers found.
Nuria Alina Chandra
Nuria Alina Chandra
ML researcher, TrueMedia.org
R
Ryan Murtfeldt
TrueMedia.org, University of Washington, Seattle
L
Lin Qiu
TrueMedia.org, University of Washington, Seattle
Arnab Karmakar
Arnab Karmakar
Research Assistant, Reasoning, AI and Vision (RAIVN) Lab, University of Washington
Computer VisionDeep LearningLarge Vision Language ModelsStable Video Diffusion
Hannah Lee
Hannah Lee
TrueMedia.org, University of Washington
E
Emmanuel Tanumihardja
TrueMedia.org, University of Washington, Seattle
K
Kevin Farhat
TrueMedia.org, University of Washington, Seattle
Ben Caffee
Ben Caffee
University of Washington
Deep LearningNatural Language ProcessingComputer VisionReinforcement Learning
Sejin Paik
Sejin Paik
Assistant Research Professor, Georgetown University
AI-Mediated CommunicationHuman-Centered AISocial MediaMedia PsychologySocial Computing
C
Changyeon Lee
Miraflow AI, Yonsei University, Seoul
J
Jongwook Choi
TrueMedia.org, Chung-Ang University, Seoul
A
Aerin Kim
TrueMedia.org, Miraflow AI
Oren Etzioni
Oren Etzioni
University of Washington
AI