NADI 2025: The First Multidialectal Arabic Speech Processing Shared Task

📅 2025-09-02

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

This work addresses low-resource Arabic dialectal speech processing, targeting three core tasks: spoken dialect recognition (DRI), dialectal automatic speech recognition (ASR), and diacritization of colloquial text. It pioneers the extension of dialect identification to the speech modality and introduces a unified framework jointly modeling ASR and diacritization across dialects. The approach integrates end-to-end deep neural networks, acoustic–linguistic joint modeling, dialect-specific adapters, and multi-task learning, augmented by self-supervised pretraining on large-scale unlabeled speech. Experimental results demonstrate substantial improvements: the best system achieves 79.8% accuracy on DRI, WER/CER of 35.68/12.20 on ASR, and WER/CER of 55.0/13.0 on diacritization. These advances significantly enhance spoken dialectal understanding and establish a novel paradigm for low-resource dialectal NLP.

Technology Category

Application Category

📝 Abstract

We present the findings of the sixth Nuanced Arabic Dialect Identification (NADI 2025) Shared Task, which focused on Arabic speech dialect processing across three subtasks: spoken dialect identification (Subtask 1), speech recognition (Subtask 2), and diacritic restoration for spoken dialects (Subtask 3). A total of 44 teams registered, and during the testing phase, 100 valid submissions were received from eight unique teams. The distribution was as follows: 34 submissions for Subtask 1 "five teamsæ, 47 submissions for Subtask 2 "six teams", and 19 submissions for Subtask 3 "two teams". The best-performing systems achieved 79.8% accuracy on Subtask 1, 35.68/12.20 WER/CER (overall average) on Subtask 2, and 55/13 WER/CER on Subtask 3. These results highlight the ongoing challenges of Arabic dialect speech processing, particularly in dialect identification, recognition, and diacritic restoration. We also summarize the methods adopted by participating teams and briefly outline directions for future editions of NADI.

Problem

Research questions and friction points this paper is trying to address.

Identifying Arabic dialects from speech data

Recognizing speech in multidialectal Arabic contexts

Restoring diacritics for spoken Arabic dialect processing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Spoken dialect identification for Arabic dialects

Speech recognition for multidialectal Arabic processing

Diacritic restoration in Arabic speech processing

🔎 Similar Papers

No similar papers found.