Are audio DeepFake detection models polyglots?

📅 2024-12-23
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study reveals a significant language dependency in audio DeepFake detection models: state-of-the-art English-pretrained models exhibit severe generalization failure on non-English languages. Method: To address this, we introduce the first multilingual benchmark for audio DeepFake detection, systematically evaluating cross-lingual transfer performance across 12 languages. Building upon mainstream detection architectures, we compare cross-lingual adaptation strategies—including fine-tuning, feature alignment, and data augmentation—while rigorously controlling target-language data volume to quantify its impact. Contribution/Results: Experiments demonstrate that (1) detection accuracy varies substantially across languages; (2) even minimal target-language data yields substantial accuracy gains; and (3) language-aware modeling is critical for multilingual robustness. This work provides the first empirical evidence of fundamental language bias in audio DeepFake detection, establishing both theoretical foundations and practical pathways toward truly multilingual-robust detection systems.

Technology Category

Application Category

📝 Abstract
Since the majority of audio DeepFake (DF) detection methods are trained on English-centric datasets, their applicability to non-English languages remains largely unexplored. In this work, we present a benchmark for the multilingual audio DF detection challenge by evaluating various adaptation strategies. Our experiments focus on analyzing models trained on English benchmark datasets, as well as intra-linguistic (same-language) and cross-linguistic adaptation approaches. Our results indicate considerable variations in detection efficacy, highlighting the difficulties of multilingual settings. We show that limiting the dataset to English negatively impacts the efficacy, while stressing the importance of the data in the target language.
Problem

Research questions and friction points this paper is trying to address.

Evaluating audio DeepFake detection in non-English languages
Assessing cross-linguistic adaptation strategies for multilingual detection
Analyzing performance gaps in English-centric vs. target-language data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmark for multilingual audio DeepFake detection
Analyze English-trained and cross-linguistic adaptation models
Highlight target language data importance for efficacy
🔎 Similar Papers
2024-04-22arXiv.orgCitations: 25
B
Bartłomiej Marek
Wrocław University of Science and Technology, Poland; CISPA – Helmholtz Center for Information Security, Germany
Piotr Kawa
Piotr Kawa
Wrocław University of Science and Technology
DeepFake detectionSpeech processingImage processingMachine learning
Piotr Syga
Piotr Syga
Politechnika Wrocławska
PrivacyBiometricsSignal Processing