SCDF: A Speaker Characteristics DeepFake Speech Dataset for Bias Analysis

📅 2025-08-11

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Current deepfake speech detection research lacks systematic evaluation of bias and fairness, particularly due to the absence of benchmark datasets covering diverse speaker attributes—including gender, age, language, and synthesizer type. Method: We introduce the first large-scale, multilingual deepfake speech dataset with balanced gender and age distributions; it is the first to systematically annotate demographic attributes and integrate high-fidelity synthetic samples generated by multiple state-of-the-art text-to-speech and voice-cloning systems. Contribution/Results: Experiments reveal substantial performance disparities across demographic groups in prevailing detection models—e.g., cross-gender and cross-lingual accuracy gaps reach 12–28%. These findings validate the dataset’s utility for quantifying algorithmic bias and advancing fairness-aware modeling. The dataset establishes a new benchmark and analytical framework for trustworthy deepfake speech detection.

Technology Category

Application Category

📝 Abstract

Despite growing attention to deepfake speech detection, the aspects of bias and fairness remain underexplored in the speech domain. To address this gap, we introduce the Speaker Characteristics Deepfake (SCDF) dataset: a novel, richly annotated resource enabling systematic evaluation of demographic biases in deepfake speech detection. SCDF contains over 237,000 utterances in a balanced representation of both male and female speakers spanning five languages and a wide age range. We evaluate several state-of-the-art detectors and show that speaker characteristics significantly influence detection performance, revealing disparities across sex, language, age, and synthesizer type. These findings highlight the need for bias-aware development and provide a foundation for building non-discriminatory deepfake detection systems aligned with ethical and regulatory standards.

Problem

Research questions and friction points this paper is trying to address.

Addressing bias in deepfake speech detection systems

Evaluating demographic disparities in detection performance

Developing ethical, non-discriminatory deepfake detection standards

Innovation

Methods, ideas, or system contributions that make the work stand out.

Richly annotated SCDF dataset for bias analysis

Evaluates demographic biases in deepfake speech detection

Supports bias-aware deepfake detection system development

🔎 Similar Papers

A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection

2024-09-23arXiv.orgCitations: 1