๐ค AI Summary
This study addresses the significant performance degradation of existing deepfake detection methods on real-world videos of low to moderate visual quality. To this end, the authors systematically compare the detection capabilities of 200 human participants against 95 state-of-the-art AI detectors on both the standard DF40 benchmark and a newly curated dataset, CharadesDF, comprising everyday smartphone-captured videos. The work reveals, for the first time, a complementary relationship between human and AI error patterns in deepfake detection and proposes an integrated humanโAI strategy that substantially reduces high-confidence misclassifications. Experimental results demonstrate that human accuracy on CharadesDF reaches 0.784, markedly surpassing the AI performance of 0.537, thereby underscoring the necessity and efficacy of humanโAI collaboration for robust deepfake detection in realistic scenarios.
๐ Abstract
Deepfake detection is widely framed as a machine learning problem, yet how humans and AI detectors compare under realistic conditions remains poorly understood. We evaluate 200 human participants and 95 state-of-the-art AI detectors across two datasets: DF40, a standard benchmark, and CharadesDF, a novel dataset of videos of everyday activities. CharadesDF was recorded using mobile phones leading to low/moderate quality videos compared to the more professionally captured DF40. Humans outperform AI detectors on both datasets, with the gap widening in the case of CharadesDF where AI accuracy collapses to near chance (0.537) while humans maintain robust performance (0.784). Human and AI errors are complementary: humans miss high-quality deepfakes while AI detectors flag authentic videos as fake, and hybrid human-AI ensembles reduce high-confidence errors. These findings suggest that effective real-world deepfake detection, especially in non-professionally produced videos, requires human-AI collaboration rather than AI algorithms alone.