🤖 AI Summary
This study addresses the lack of systematic and cross-dialectal resources in Arabic language learning, which hinders comparative analysis of linguistic variants and contextual cultural understanding. To bridge this gap, the authors construct a dataset of 522 phrases spanning six Arabic dialects, developed through large language model generation and validated via a native-speaker hierarchical verification pipeline. They further present an open-source, interactive learning platform featuring translation exploration, adaptive quizzes with algorithmically generated distractors, thematically organized difficulty levels, cultural context annotations, and cloud-based progress synchronization. Released under the MIT License, both the dataset and source code are publicly available alongside a live web interface. This work represents the first structured, scalable solution for multi-dialect Arabic acquisition, effectively filling a critical void in cross-variant language education tools.
📝 Abstract
We present ArabicDialectHub, a cross-dialectal Arabic learning resource comprising 552 phrases across six varieties (Moroccan Darija, Lebanese, Syrian, Emirati, Saudi, and MSA) and an interactive web platform. Phrases were generated using LLMs and validated by five native speakers, stratified by difficulty, and organized thematically. The open-source platform provides translation exploration, adaptive quizzing with algorithmic distractor generation, cloud-synchronized progress tracking, and cultural context. Both the dataset and complete platform source code are released under MIT license. Platform: https://arabic-dialect-hub.netlify.app.