Transformer-Based Low-Resource Language Translation: A Study on Standard Bengali to Sylheti

📅 2025-10-20

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This study addresses the machine translation challenge for the low-resource language Sylheti. We conduct the first systematic evaluation of multilingual Transformer models on the standard Bengali→Sylheti translation task. Supervised fine-tuning is applied to mBART-50 and MarianMT, and their performance is benchmarked against zero-shot large language models (LLMs). Results show that fine-tuning substantially improves translation quality: mBART-50 achieves the highest adequacy, while MarianMT excels in character-level fidelity—both significantly outperforming zero-shot LLMs. Our work establishes the first high-quality Bengali→Sylheti parallel corpus and corresponding benchmark models. Crucially, it empirically validates the effectiveness of lightweight fine-tuning for neural machine translation (NMT) of marginalized languages, offering a reproducible methodology and empirical evidence to support equitable technological access for low-resource languages.

Technology Category

Application Category

📝 Abstract

Machine Translation (MT) has advanced from rule-based and statistical methods to neural approaches based on the Transformer architecture. While these methods have achieved impressive results for high-resource languages, low-resource varieties such as Sylheti remain underexplored. In this work, we investigate Bengali-to-Sylheti translation by fine-tuning multilingual Transformer models and comparing them with zero-shot large language models (LLMs). Experimental results demonstrate that fine-tuned models significantly outperform LLMs, with mBART-50 achieving the highest translation adequacy and MarianMT showing the strongest character-level fidelity. These findings highlight the importance of task-specific adaptation for underrepresented languages and contribute to ongoing efforts toward inclusive language technologies.

Problem

Research questions and friction points this paper is trying to address.

Developing machine translation for low-resource Sylheti language

Comparing fine-tuned Transformers with zero-shot LLMs

Enhancing translation quality for underrepresented language varieties

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuning multilingual Transformer models for translation

Comparing fine-tuned models with zero-shot LLMs

Achieving highest translation adequacy with mBART-50

🔎 Similar Papers

No similar papers found.