Transformer-Based Low-Resource Language Translation: A Study on Standard Bengali to Sylheti

📅 2025-10-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the machine translation challenge for the low-resource language Sylheti. We conduct the first systematic evaluation of multilingual Transformer models on the standard Bengali→Sylheti translation task. Supervised fine-tuning is applied to mBART-50 and MarianMT, and their performance is benchmarked against zero-shot large language models (LLMs). Results show that fine-tuning substantially improves translation quality: mBART-50 achieves the highest adequacy, while MarianMT excels in character-level fidelity—both significantly outperforming zero-shot LLMs. Our work establishes the first high-quality Bengali→Sylheti parallel corpus and corresponding benchmark models. Crucially, it empirically validates the effectiveness of lightweight fine-tuning for neural machine translation (NMT) of marginalized languages, offering a reproducible methodology and empirical evidence to support equitable technological access for low-resource languages.

Technology Category

Application Category

📝 Abstract
Machine Translation (MT) has advanced from rule-based and statistical methods to neural approaches based on the Transformer architecture. While these methods have achieved impressive results for high-resource languages, low-resource varieties such as Sylheti remain underexplored. In this work, we investigate Bengali-to-Sylheti translation by fine-tuning multilingual Transformer models and comparing them with zero-shot large language models (LLMs). Experimental results demonstrate that fine-tuned models significantly outperform LLMs, with mBART-50 achieving the highest translation adequacy and MarianMT showing the strongest character-level fidelity. These findings highlight the importance of task-specific adaptation for underrepresented languages and contribute to ongoing efforts toward inclusive language technologies.
Problem

Research questions and friction points this paper is trying to address.

Developing machine translation for low-resource Sylheti language
Comparing fine-tuned Transformers with zero-shot LLMs
Enhancing translation quality for underrepresented language varieties
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuning multilingual Transformer models for translation
Comparing fine-tuned models with zero-shot LLMs
Achieving highest translation adequacy with mBART-50
🔎 Similar Papers
No similar papers found.
M
Mangsura Kabir Oni
Faculty of Computer Science and Engineering , University of Brahmanbaria , Bangladesh
Tabia Tanzin Prama
Tabia Tanzin Prama
Phd Student of Computer Science
Data MiningNLPHealth InformaticsAI Ethics