🤖 AI Summary
Deepfake detection performance is often dominated by implementation details—such as preprocessing, data augmentation, and optimization—rather than model architecture itself, hindering fair benchmarking and obscuring key performance drivers.
Method: We conduct controlled ablation experiments to systematically evaluate design choices across training, inference, and incremental update stages, isolating the independent effects of data augmentation strategies, optimization scheduling, and inference protocols.
Contribution/Results: We derive a set of architecture-agnostic, generalizable design principles that identify empirically validated practices for improving both accuracy and cross-dataset generalization. Evaluated on the AI-GenBench benchmark, our guidelines yield state-of-the-art performance. They provide a methodological foundation for reproducible evaluation, equitable model comparison, and sustainable iterative development of deepfake detectors.
📝 Abstract
The effectiveness of deepfake detection methods often depends less on their core design and more on implementation details such as data preprocessing, augmentation strategies, and optimization techniques. These factors make it difficult to fairly compare detectors and to understand which factors truly contribute to their performance. To address this, we systematically investigate how different design choices influence the accuracy and generalization capabilities of deepfake detection models, focusing on aspects related to training, inference, and incremental updates. By isolating the impact of individual factors, we aim to establish robust, architecture-agnostic best practices for the design and development of future deepfake detection systems. Our experiments identify a set of design choices that consistently improve deepfake detection and enable state-of-the-art performance on the AI-GenBench benchmark.