Exploring Data and Parameter Efficient Strategies for Arabic Dialect Identifications

📅 2025-09-17

📈 Citations: 0

✨ Influential: 0

career value

154K/year

🤖 AI Summary

This work addresses the dual challenge of data and parameter scarcity in Arabic Dialect Identification (ADI). We propose an efficient fine-tuning framework that synergistically integrates soft prompting and Low-Rank Adaptation (LoRA). Methodologically, we systematically compare prominent parameter-efficient techniques—including Prefix-tuning, P-tuning variants, Prompt-tuning, and LoRA—using an Arabic-specific encoder-decoder architecture under zero-shot and few-shot settings. Results demonstrate that LoRA fine-tuning substantially outperforms full-parameter fine-tuning; soft prompting methods achieve competitive but lower performance; and large language models still exhibit limited dialect discrimination capability in few-shot scenarios. To our knowledge, this is the first study to jointly apply LoRA and soft prompting to ADI, empirically validating the effectiveness and superiority of parameter-efficient paradigms for low-resource Arabic NLP. Our approach provides a scalable, lightweight technical pathway for dialect identification in resource-constrained languages.

Technology Category

Application Category

📝 Abstract

This paper discusses our exploration of different data-efficient and parameter-efficient approaches to Arabic Dialect Identification (ADI). In particular, we investigate various soft-prompting strategies, including prefix-tuning, prompt-tuning, P-tuning, and P-tuning V2, as well as LoRA reparameterizations. For the data-efficient strategy, we analyze hard prompting with zero-shot and few-shot inferences to analyze the dialect identification capabilities of Large Language Models (LLMs). For the parameter-efficient PEFT approaches, we conducted our experiments using Arabic-specific encoder models on several major datasets. We also analyzed the n-shot inferences on open-source decoder-only models, a general multilingual model (Phi-3.5), and an Arabic-specific one(SILMA). We observed that the LLMs generally struggle to differentiate the dialectal nuances in the few-shot or zero-shot setups. The soft-prompted encoder variants perform better, while the LoRA-based fine-tuned models perform best, even surpassing full fine-tuning.

Problem

Research questions and friction points this paper is trying to address.

Exploring data-efficient strategies for Arabic dialect identification

Investigating parameter-efficient methods like soft-prompting and LoRA

Analyzing LLM performance on dialect differentiation tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Soft-prompting strategies for dialect identification

LoRA reparameterization for parameter efficiency

Few-shot inference with multilingual models

🔎 Similar Papers

No similar papers found.