SHAMI-MT: A Syrian Arabic Dialect to Modern Standard Arabic Bidirectional Machine Translation System

📅 2025-08-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Arabic exhibits a diglossic situation between Modern Standard Arabic (MSA) and regional dialects, with the low-resource Syrian Arabic (Shami) dialect posing significant challenges for machine translation. To address this, we propose a dedicated dual-model architecture for bidirectional MSA–Shami translation, built upon AraT5v2-base-1024 and fine-tuned on the Nabra dataset, with evaluation conducted on the MADAR corpus. Our key contribution is the first end-to-end, high-fidelity, and nativelike bidirectional translation system for Shami↔MSA, bridging a critical gap in low-resource dialectal MT. Automatic evaluation augmented by human assessment using GPT-4.1 yields a score of 4.01/5.0 for MSA→Shami translation—substantially outperforming baselines—and confirms the model’s effectiveness in grammatical adaptation, pragmatic naturalness, and preservation of dialect-specific features.

Technology Category

Application Category

📝 Abstract
The rich linguistic landscape of the Arab world is characterized by a significant gap between Modern Standard Arabic (MSA), the language of formal communication, and the diverse regional dialects used in everyday life. This diglossia presents a formidable challenge for natural language processing, particularly machine translation. This paper introduces extbf{SHAMI-MT}, a bidirectional machine translation system specifically engineered to bridge the communication gap between MSA and the Syrian dialect. We present two specialized models, one for MSA-to-Shami and another for Shami-to-MSA translation, both built upon the state-of-the-art AraT5v2-base-1024 architecture. The models were fine-tuned on the comprehensive Nabra dataset and rigorously evaluated on unseen data from the MADAR corpus. Our MSA-to-Shami model achieved an outstanding average quality score of extbf{4.01 out of 5.0} when judged by OPENAI model GPT-4.1, demonstrating its ability to produce translations that are not only accurate but also dialectally authentic. This work provides a crucial, high-fidelity tool for a previously underserved language pair, advancing the field of dialectal Arabic translation and offering significant applications in content localization, cultural heritage, and intercultural communication.
Problem

Research questions and friction points this paper is trying to address.

Bridging translation gap between Syrian Arabic and Modern Standard Arabic
Developing bidirectional machine translation for dialectal Arabic
Addressing diglossia challenge in Arabic natural language processing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bidirectional MSA-Syrian dialect translation system
AraT5v2-base-1024 architecture for specialized models
Fine-tuned on Nabra dataset for dialectal accuracy
🔎 Similar Papers
No similar papers found.