Gender Fairness in Audio Deepfake Detection: Performance and Disparity Analysis

πŸ“… 2026-03-09
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This study addresses the issue of gender bias in audio deepfake detection, which is often obscured by conventional evaluation metrics. Leveraging the ASVspoof 5 dataset and employing ResNet-18 and AASIST models with four distinct audio feature representations, the work systematically introduces five fairness metrics for the first time to comprehensively assess performance disparities across gender groups. While the overall equal error rate (EER) exhibits only a minor gender gap, the fairness-aware metrics reveal significant gender-related error imbalances, demonstrating that standard evaluation protocols are insufficient for capturing model fairness. This research establishes a new paradigm and provides an empirical foundation for fairness-aware evaluation in audio spoofing detection, highlighting the necessity of incorporating fairness considerations into model assessment beyond aggregate accuracy measures.

Technology Category

Application Category

πŸ“ Abstract
Audio deepfake detection aims to detect real human voices from those generated by Artificial Intelligence (AI) and has emerged as a significant problem in the field of voice biometrics systems. With the ever-improving quality of synthetic voice, the probability of such a voice being exploited for illicit practices like identity thest and impersonation increases. Although significant progress has been made in the field of Audio Deepfake Detection in recent times, the issue of gender bias remains underexplored and in its nascent stage In this paper, we have attempted a thorough analysis of gender dependent performance and fairness in audio deepfake detection models. We have used the ASVspoof 5 dataset and train a ResNet-18 classifier and evaluate detection performance across four different audio features, and compared the performance with baseline AASIST model. Beyond conventional metrics such as Equal Error Rate (EER %), we incorporated five established fairness metrics to quantify gender disparities in the model. Our results show that even when the overall EER difference between genders appears low, fairness-aware evaluation reveals disparities in error distribution that are obscured by aggregate performance measures. These findings demonstrate that reliance on standard metrics is unreliable, whereas fairness metrics provide critical insights into demographic-specific failure modes. This work highlights the importance of fairness-aware evaluation for developing a more equitable, robust, and trustworthy audio deepfake detection system.
Problem

Research questions and friction points this paper is trying to address.

gender fairness
audio deepfake detection
bias
voice biometrics
fairness evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

gender fairness
audio deepfake detection
fairness metrics
bias analysis
voice biometrics
πŸ”Ž Similar Papers
No similar papers found.
A
Aishwarya Fursule
Department of Electrical Engineering and Computer Science, Wichita State University, Wichita, KS, USA
Shruti Kshirsagar
Shruti Kshirsagar
Wichita State University
Deep LearningHealthcare & AISignal ProcessingEmotion RecognitionDeep Fake
A
Anderson R. Avila
Institut national de la recherche scientifique (INRS–EMT), Montreal, QC, Canada