Evaluating Deepfake Detectors in the Wild

📅 2025-07-29

📈 Citations: 0

✨ Influential: 0

career value

256K/year

🤖 AI Summary

Deepfakes pose serious threats to digital media authenticity and identity verification, yet existing detection methods lack rigorous evaluation under realistic conditions. Method: We propose a practical evaluation paradigm incorporating common post-processing perturbations—such as JPEG compression and contrast enhancement—and construct a large-scale benchmark dataset of 500,000 high-fidelity deepfake images generated by state-of-the-art diffusion and GAN-based models. Using AUC as the primary metric, we systematically assess 12 representative detectors. Results: Fewer than half achieve AUC > 60%; several perform near chance level. Crucially, our experiments reveal a catastrophic performance drop across all detectors when subjected to real-world image processing pipelines—uncovering their severe vulnerability to post-processing distortions. This work establishes a new standard for practical, application-oriented evaluation of deepfake detection and identifies robustness to common corruptions as a critical research direction for advancing deployable solutions.

Technology Category

Application Category

📝 Abstract

Deepfakes powered by advanced machine learning models present a significant and evolving threat to identity verification and the authenticity of digital media. Although numerous detectors have been developed to address this problem, their effectiveness has yet to be tested when applied to real-world data. In this work we evaluate modern deepfake detectors, introducing a novel testing procedure designed to mimic real-world scenarios for deepfake detection. Using state-of-the-art deepfake generation methods, we create a comprehensive dataset containing more than 500,000 high-quality deepfake images. Our analysis shows that detecting deepfakes still remains a challenging task. The evaluation shows that in fewer than half of the deepfake detectors tested achieved an AUC score greater than 60%, with the lowest being 50%. We demonstrate that basic image manipulations, such as JPEG compression or image enhancement, can significantly reduce model performance. All code and data are publicly available at https://github.com/messlav/Deepfake-Detectors-in-the-Wild.

Problem

Research questions and friction points this paper is trying to address.

Assess deepfake detectors' real-world effectiveness

Evaluate detectors using realistic testing scenarios

Analyze impact of image manipulations on detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel real-world scenario testing procedure

State-of-the-art deepfake generation methods

Comprehensive dataset with 500,000 images

🔎 Similar Papers

No similar papers found.