🤖 AI Summary
Existing Arabic automatic speech recognition (ASR) systems exhibit strong bias toward Modern Standard Arabic (MSA) and a few high-resource dialects, offering inadequate coverage for 11 low-resource spoken varieties and poor robustness to pervasive Arabic–English and Arabic–French code-switching. Method: We propose the first open-source ASR framework unifying MSA, 11 regional dialects, and bilingual code-switched speech, trained on speech data from 17 countries. Our approach integrates multi-task pretraining, dialect-adaptive fine-tuning, multilingual joint modeling, and geography-aware data fusion. Contribution/Results: On a multi-variant test set, our system reduces dialect-specific word error rates (WER) by 22% on average and lowers WER on code-switched utterances by 18%. It significantly improves generalization to low-resource dialects and cross-variant robustness. All models and curated datasets are publicly released.
📝 Abstract
Developing robust automatic speech recognition (ASR) systems for Arabic requires effective strategies to manage its diversity. Existing ASR systems mainly cover the modern standard Arabic (MSA) variety and few high-resource dialects, but fall short in coverage and generalization across the multitude of spoken variants. Code-switching with English and French is also common in different regions of the Arab world, which challenges the performance of monolingual Arabic models. In this work, we introduce a suite of ASR models optimized to effectively recognize multiple variants of spoken Arabic, including MSA, various dialects, and code-switching. We provide open-source pre-trained models that cover data from 17 Arabic-speaking countries, and fine-tuned MSA and dialectal ASR models that include at least 11 variants, as well as multi-lingual ASR models covering embedded languages in code-switched utterances. We evaluate ASR performance across these spoken varieties and demonstrate both coverage and performance gains compared to prior models.