π€ AI Summary
This study addresses significant fairness disparities in audio deepfake detection systems across gender groups and the lack of systematic diagnosis of their origins. The work proposes the first βdiagnose-then-mitigateβ framework, revealing that bias stems from acoustic representation differences, gender information leakage in features, and asymmetric evaluation protocols. Based on these insights, the authors introduce gender-specific decision threshold tuning, a novel round-wise fairness regularization method, and an adversarial debiasing strategy. Experiments on the ASVSpoof5 dataset using AASIST and Wav2Vec2+ResNet18 models demonstrate that gender-aware threshold adjustment reduces unfairness by 54%β75% without compromising detection accuracy, and the proposed regularization outperforms existing batch-level approaches. The study underscores the critical role of precise bias diagnosis in selecting effective debiasing interventions.
π Abstract
Audio deepfake detection systems are increasingly deployed in high-stakes security applications, yet their fairness across demographic groups remains critically underexamined. Prior work measures gender disparity but does not investigate where it comes from or how to fix it systematically. We present the first diagnosis-first framework that identifies bias source before applying targeted mitigation, evaluated on two models, AASIST and Wav2Vec2+ResNet18, on ASVSpoof5. Our diagnosis shows that bias does not stem from imbalanced training data but from acoustic representation differences, gender leakage in learned features, and structural evaluation asymmetry. We test mitigation strategies across in-processing, post-processing and combined families, including novel methods introduced in this work. Adjusting the decision threshold separately per gender reduces unfairness by 54% to 75% at no cost to detection accuracy, and our new epoch-level fairness regularisation method outperforms existing per-batch approaches. Adversarial debiasing succeeds only when gender leakage is localised, and fails when it is diffuse, an outcome correctly predicted by our diagnosis before training. No single method fully closes the fairness gap, confirming that bias sources must be identified before fixes are applied and that fairer benchmark design is equally important