Dialectal Coverage And Generalization in Arabic Speech Recognition

📅 2024-11-07

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

Existing Arabic automatic speech recognition (ASR) systems exhibit strong bias toward Modern Standard Arabic (MSA) and a few high-resource dialects, offering inadequate coverage for 11 low-resource spoken varieties and poor robustness to pervasive Arabic–English and Arabic–French code-switching. Method: We propose the first open-source ASR framework unifying MSA, 11 regional dialects, and bilingual code-switched speech, trained on speech data from 17 countries. Our approach integrates multi-task pretraining, dialect-adaptive fine-tuning, multilingual joint modeling, and geography-aware data fusion. Contribution/Results: On a multi-variant test set, our system reduces dialect-specific word error rates (WER) by 22% on average and lowers WER on code-switched utterances by 18%. It significantly improves generalization to low-resource dialects and cross-variant robustness. All models and curated datasets are publicly released.

Technology Category

Application Category

📝 Abstract

Developing robust automatic speech recognition (ASR) systems for Arabic requires effective strategies to manage its diversity. Existing ASR systems mainly cover the modern standard Arabic (MSA) variety and few high-resource dialects, but fall short in coverage and generalization across the multitude of spoken variants. Code-switching with English and French is also common in different regions of the Arab world, which challenges the performance of monolingual Arabic models. In this work, we introduce a suite of ASR models optimized to effectively recognize multiple variants of spoken Arabic, including MSA, various dialects, and code-switching. We provide open-source pre-trained models that cover data from 17 Arabic-speaking countries, and fine-tuned MSA and dialectal ASR models that include at least 11 variants, as well as multi-lingual ASR models covering embedded languages in code-switched utterances. We evaluate ASR performance across these spoken varieties and demonstrate both coverage and performance gains compared to prior models.

Problem

Research questions and friction points this paper is trying to address.

Addressing dialect diversity in Arabic speech recognition

Improving coverage of Modern Standard Arabic and dialects

Handling code-switching with English and French

Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimized ASR models for multiple Arabic variants

Open-source pre-trained models covering 17 countries

Multilingual ASR for code-switched utterances

🔎 Similar Papers

No similar papers found.