From FusHa to Folk: Exploring Cross-Lingual Transfer in Arabic Language Models

📅 2026-02-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the limited generalization of existing Arabic language models—primarily pretrained on Modern Standard Arabic (MSA)—to diverse dialectal varieties. Through probing tasks, representational similarity analysis, and evaluation on a multi-dialect NLP benchmark, the authors systematically assess cross-variety transfer capabilities from MSA to regional dialects. Their findings reveal significant and uneven transfer performance, partially influenced by geographic proximity. Notably, joint training on multiple dialects induces negative interference, degrading overall performance—a result that challenges the common assumption of high mutual intelligibility among Arabic dialects. This work provides the first empirical evidence of semantic or structural conflicts between Arabic varieties, offering critical insights for multivariant language modeling in linguistically heterogeneous settings.

Technology Category

Application Category

📝 Abstract
Arabic Language Models (LMs) are pretrained predominately on Modern Standard Arabic (MSA) and are expected to transfer to its dialects. While MSA as the standard written variety is commonly used in formal settings, people speak and write online in various dialects that are spread across the Arab region. This poses limitations for Arabic LMs, since its dialects vary in their similarity to MSA. In this work we study cross-lingual transfer of Arabic models using probing on 3 Natural Language Processing (NLP) Tasks, and representational similarity. Our results indicate that transfer is possible but disproportionate across dialects, which we find to be partially explained by their geographic proximity. Furthermore, we find evidence for negative interference in models trained to support all Arabic dialects. This questions their degree of similarity, and raises concerns for cross-lingual transfer in Arabic models.
Problem

Research questions and friction points this paper is trying to address.

Arabic Language Models
Modern Standard Arabic
dialects
cross-lingual transfer
negative interference
Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-lingual transfer
Arabic dialects
negative interference
representational similarity
geographic proximity
🔎 Similar Papers
No similar papers found.
A
Abdulmuizz Khalak
Department of Advanced Computing Sciences, Maastricht University
A
Abderrahmane Issam
Department of Advanced Computing Sciences, Maastricht University
Gerasimos Spanakis
Gerasimos Spanakis
Maastricht University
Assistant Professor