ELYADATA & LIA at NADI 2025: ASR and ADI Subtasks

πŸ“… 2025-11-13
πŸ›οΈ Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks
πŸ“ˆ Citations: 1
✨ Influential: 1
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the NADI multilingual Arabic dialect speech processing task, tackling two core challenges: Arabic Dialect Identification (ADI) and multilingual Automatic Speech Recognition (ASR). We propose a joint optimization framework based on large-model fine-tuning and dialect-specific data augmentation. For ADI, we employ the Whisper-large-v3 encoder with dialect-aware data augmentation to achieve end-to-end dialect classification. For ASR, we fine-tune the SeamlessM4T-v2 Large model separately on each of eight Arabic dialects to enhance cross-dialect robustness. Our approach significantly outperforms baselines: achieving 79.83% accuracy on ADI (ranked first), and average WER/CER of 38.54%/14.53% on ASR (ranked second). The key contribution lies in empirically validating that dialect-specific fine-tuning combined with domain-adaptive data augmentation substantially improves low-resource multilingual speech modeling performance.

Technology Category

Application Category

πŸ“ Abstract
This paper describes Elyadata &LIA's joint submission to the NADI multi-dialectal Arabic Speech Processing 2025. We participated in the Spoken Arabic Dialect Identification (ADI) and multi-dialectal Arabic ASR subtasks. Our submission ranked first for the ADI subtask and second for the multi-dialectal Arabic ASR subtask among all participants. Our ADI system is a fine-tuned Whisper-large-v3 encoder with data augmentation. This system obtained the highest ADI accuracy score of extbf{79.83%} on the official test set. For multi-dialectal Arabic ASR, we fine-tuned SeamlessM4T-v2 Large (Egyptian variant) separately for each of the eight considered dialects. Overall, we obtained an average WER and CER of extbf{38.54%} and extbf{14.53%}, respectively, on the test set. Our results demonstrate the effectiveness of large pre-trained speech models with targeted fine-tuning for Arabic speech processing.
Problem

Research questions and friction points this paper is trying to address.

Identifying spoken Arabic dialects using fine-tuned Whisper models with data augmentation
Developing multi-dialectal Arabic automatic speech recognition systems for eight dialects
Evaluating effectiveness of large pre-trained models for Arabic speech processing tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuned Whisper encoder for dialect identification
Used data augmentation to improve ADI system accuracy
Fine-tuned SeamlessM4T separately for each Arabic dialect
πŸ”Ž Similar Papers
No similar papers found.