🤖 AI Summary
This study addresses the machine translation challenge for the low-resource language Sylheti. We conduct the first systematic evaluation of multilingual Transformer models on the standard Bengali→Sylheti translation task. Supervised fine-tuning is applied to mBART-50 and MarianMT, and their performance is benchmarked against zero-shot large language models (LLMs). Results show that fine-tuning substantially improves translation quality: mBART-50 achieves the highest adequacy, while MarianMT excels in character-level fidelity—both significantly outperforming zero-shot LLMs. Our work establishes the first high-quality Bengali→Sylheti parallel corpus and corresponding benchmark models. Crucially, it empirically validates the effectiveness of lightweight fine-tuning for neural machine translation (NMT) of marginalized languages, offering a reproducible methodology and empirical evidence to support equitable technological access for low-resource languages.
📝 Abstract
Machine Translation (MT) has advanced from rule-based and statistical methods to neural approaches based on the Transformer architecture. While these methods have achieved impressive results for high-resource languages, low-resource varieties such as Sylheti remain underexplored. In this work, we investigate Bengali-to-Sylheti translation by fine-tuning multilingual Transformer models and comparing them with zero-shot large language models (LLMs). Experimental results demonstrate that fine-tuned models significantly outperform LLMs, with mBART-50 achieving the highest translation adequacy and MarianMT showing the strongest character-level fidelity. These findings highlight the importance of task-specific adaptation for underrepresented languages and contribute to ongoing efforts toward inclusive language technologies.