Are audio DeepFake detection models polyglots?

📅 2024-12-23

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This study reveals a significant language dependency in audio DeepFake detection models: state-of-the-art English-pretrained models exhibit severe generalization failure on non-English languages. Method: To address this, we introduce the first multilingual benchmark for audio DeepFake detection, systematically evaluating cross-lingual transfer performance across 12 languages. Building upon mainstream detection architectures, we compare cross-lingual adaptation strategies—including fine-tuning, feature alignment, and data augmentation—while rigorously controlling target-language data volume to quantify its impact. Contribution/Results: Experiments demonstrate that (1) detection accuracy varies substantially across languages; (2) even minimal target-language data yields substantial accuracy gains; and (3) language-aware modeling is critical for multilingual robustness. This work provides the first empirical evidence of fundamental language bias in audio DeepFake detection, establishing both theoretical foundations and practical pathways toward truly multilingual-robust detection systems.

Technology Category

Application Category

📝 Abstract

Since the majority of audio DeepFake (DF) detection methods are trained on English-centric datasets, their applicability to non-English languages remains largely unexplored. In this work, we present a benchmark for the multilingual audio DF detection challenge by evaluating various adaptation strategies. Our experiments focus on analyzing models trained on English benchmark datasets, as well as intra-linguistic (same-language) and cross-linguistic adaptation approaches. Our results indicate considerable variations in detection efficacy, highlighting the difficulties of multilingual settings. We show that limiting the dataset to English negatively impacts the efficacy, while stressing the importance of the data in the target language.

Problem

Research questions and friction points this paper is trying to address.

Evaluating audio DeepFake detection in non-English languages

Assessing cross-linguistic adaptation strategies for multilingual detection

Analyzing performance gaps in English-centric vs. target-language data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmark for multilingual audio DeepFake detection

Analyze English-trained and cross-linguistic adaptation models

Highlight target language data importance for efficacy

🔎 Similar Papers

Audio Anti-Spoofing Detection: A Survey

2024-04-22arXiv.orgCitations: 25