🤖 AI Summary
Deepfake detection models frequently exhibit significant demographic bias—particularly across race and gender subgroups—undermining fairness. To address this, we propose Fair-FLIP, a plug-and-play post-training fairness optimization method. It identifies high- and low-variance samples per subgroup via subgroup variance analysis and dynamically reweights inputs to the final classification layer, thereby suppressing bias amplification and enhancing robust subgroup representations. Fair-FLIP requires no architectural modification or model retraining and is compatible with mainstream deepfake detectors. Extensive experiments across multiple benchmark datasets demonstrate that Fair-FLIP incurs only a marginal 0.25% drop in overall detection accuracy while improving key fairness metrics—including equal opportunity difference and mean absolute error difference—by up to 30%. To our knowledge, this is the first approach to systematically mitigate group-level unfairness in deepfake detection without compromising strong detection performance.
📝 Abstract
Artificial Intelligence-generated content has become increasingly popular, yet its malicious use, particularly the deepfakes, poses a serious threat to public trust and discourse. While deepfake detection methods achieve high predictive performance, they often exhibit biases across demographic attributes such as ethnicity and gender. In this work, we tackle the challenge of fair deepfake detection, aiming to mitigate these biases while maintaining robust detection capabilities. To this end, we propose a novel post-processing approach, referred to as Fairness-Oriented Final Layer Input Prioritising (Fair-FLIP), that reweights a trained model's final-layer inputs to reduce subgroup disparities, prioritising those with low variability while demoting highly variable ones. Experimental results comparing Fair-FLIP to both the baseline (without fairness-oriented de-biasing) and state-of-the-art approaches show that Fair-FLIP can enhance fairness metrics by up to 30% while maintaining baseline accuracy, with only a negligible reduction of 0.25%.
Code is available on Github: https://github.com/szandala/fair-deepfake-detection-toolbox