When Spoof Detectors Travel: Evaluation Across 66 Languages in the Low-Resource Language Spoofing Corpus

📅 2026-03-02

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

This study addresses the limited generalization of existing voice spoofing detectors in low-resource languages and the absence of cross-lingual evaluation protocols that do not require bona fide speech from the target domain. To this end, the authors construct LRLspoof, a multilingual synthetic speech dataset spanning 66 languages—including 45 low-resource ones—comprising 2,732 hours of speech generated by 24 open-source text-to-speech systems. They propose a threshold-transfer-based evaluation paradigm that assesses the cross-lingual robustness of 11 publicly available anti-spoofing models using Spoofing Rejection Rate (SRR) without relying on target-domain genuine samples. This work is the first to systematically demonstrate that language itself is an independent factor contributing to performance variability, thereby establishing the critical influence of linguistic characteristics on anti-spoofing system robustness.

Technology Category

Application Category

📝 Abstract

We introduce LRLspoof, a large-scale multilingual synthetic-speech corpus for cross-lingual spoof detection, comprising 2,732 hours of audio generated with 24 open-source TTS systems across 66 languages, including 45 low-resource languages under our operational definition. To evaluate robustness without requiring target-domain bonafide speech, we benchmark 11 publicly available countermeasures using threshold transfer: for each model we calibrate an EER operating point on pooled external benchmarks and apply the resulting threshold, reporting spoof rejection rate (SRR). Results show model-dependent cross-lingual disparity, with spoof rejection varying markedly across languages even under controlled conditions, highlighting language as an independent source of domain shift in spoof detection. The dataset is publicly available at \href{https://huggingface.co/datasets/MTUCI/LRLspoof}{\textbf{\underline{\textit{HuggingFace}}}} and \href{https://modelscope.cn/datasets/lab260/LRLspoof}{\textbf{\underline{\textit{ModelScope}}}}

Problem

Research questions and friction points this paper is trying to address.

spoof detection

cross-lingual

low-resource languages

domain shift

synthetic speech

Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-lingual spoof detection

low-resource languages

threshold transfer

synthetic speech corpus

domain shift

🔎 Similar Papers

No similar papers found.