🤖 AI Summary
The increasing realism of deepfake videos has significantly heightened the difficulty of reliable detection. Method: To address this, we introduce 1M-Deepfakes—the first large-scale, audio-visual deepfake benchmark explicitly designed for realistic web environments—comprising 2 million high-fidelity forged video clips. It integrates state-of-the-art text-to-speech and facial reenactment models and systematically incorporates 12 types of real-world corruptions—including compression, noise, and resolution degradation—to enable controllable modeling of complex interference. Contribution/Results: Compared to existing benchmarks, 1M-Deepfakes substantially enhances diversity, photorealism, and ecological validity of forged samples, providing a more challenging and realistic training and evaluation foundation for detection algorithms. As the official benchmark for the 2025 1M-Deepfakes Detection Challenge, it advances the development of practical, robust deepfake detection systems.
📝 Abstract
The rapid surge of text-to-speech and face-voice reenactment models makes video fabrication easier and highly realistic. To encounter this problem, we require datasets that rich in type of generation methods and perturbation strategy which is usually common for online videos. To this end, we propose AV-Deepfake1M++, an extension of the AV-Deepfake1M having 2 million video clips with diversified manipulation strategy and audio-visual perturbation. This paper includes the description of data generation strategies along with benchmarking of AV-Deepfake1M++ using state-of-the-art methods. We believe that this dataset will play a pivotal role in facilitating research in Deepfake domain. Based on this dataset, we host the 2025 1M-Deepfakes Detection Challenge. The challenge details, dataset and evaluation scripts are available online under a research-only license at https://deepfakes1m.github.io/2025.