A General Model for Deepfake Speech Detection: Diverse Bonafide Resources or Diverse AI-Based Generators

📅 2026-03-29

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

This study addresses the limited generalization capability of existing deepfake speech detection models, which often stems from training data that is constrained to a narrow set of bona fide (BR) speech sources or a single type of artifact-generating (AG) method. We demonstrate for the first time that balancing diversity across both BR and AG dimensions is crucial for achieving robust and generalizable detection performance. To this end, we construct a comprehensive dataset that explicitly incorporates heterogeneity in both real speech origins and synthetic generation techniques. Leveraging multiple publicly available sources, we establish a deep learning baseline and conduct cross-dataset training and evaluation protocols. Experimental results show that models trained on this balanced dataset achieve significantly improved generalization and detection robustness across multiple benchmark test sets.

Technology Category

Application Category

📝 Abstract

In this paper, we analyze two main factors of Bonafide Resource (BR) or AI-based Generator (AG) which affect the performance and the generality of a Deepfake Speech Detection (DSD) model. To this end, we first propose a deep-learning based model, referred to as the baseline. Then, we conducted experiments on the baseline by which we indicate how Bonafide Resource (BR) and AI-based Generator (AG) factors affect the threshold score used to detect fake or bonafide input audio in the inference process. Given the experimental results, a dataset, which re-uses public Deepfake Speech Detection (DSD) datasets and shows a balance between Bonafide Resource (BR) or AI-based Generator (AG), is proposed. We then train various deep-learning based models on the proposed dataset and conduct cross-dataset evaluation on different benchmark datasets. The cross-dataset evaluation results prove that the balance of Bonafide Resources (BR) and AI-based Generators (AG) is the key factor to train and achieve a general Deepfake Speech Detection (DSD) model.

Problem

Research questions and friction points this paper is trying to address.

Deepfake Speech Detection

Bonafide Resource

AI-based Generator

Model Generalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Deepfake Speech Detection

Bonafide Resource

AI-based Generator

Cross-dataset Evaluation

Generalization

🔎 Similar Papers

A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection

2024-09-23arXiv.orgCitations: 1

Bosch Group

Renningen, BW, DE

Research Scientist Intern, Multimodal AI (PhD)