Beyond Vanilla Fine-Tuning: Leveraging Multistage, Multilingual, and Domain-Specific Methods for Low-Resource Machine Translation

📅 2025-03-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
For neural machine translation (NMT) into ultra-low-resource languages—such as Sinhala and Tamil—where parallel training data is severely limited (<100K sentence pairs), standard fine-tuning yields suboptimal performance. To address this, we propose a multi-stage adaptation framework: (1) domain-adaptive continual pretraining (CPT) on monolingual target-language corpora; (2) intermediate-task transfer learning (ITTL) leveraging cross-domain parallel data; and (3) ensemble inference integrating outputs from multiple adapted models. Our work introduces the novel “domain-specific monolingual CPT + cross-domain ITTL” synergistic paradigm, effectively alleviating the data bottleneck inherent in conventional fine-tuning. Experiments across six Sinhala–Tamil–English translation directions demonstrate an average BLEU improvement of +1.47 over strong fine-tuning baselines; ensemble integration further amplifies gains, consistently outperforming standard approaches.

Technology Category

Application Category

📝 Abstract
Fine-tuning multilingual sequence-to-sequence large language models (msLLMs) has shown promise in developing neural machine translation (NMT) systems for low-resource languages (LRLs). However, conventional single-stage fine-tuning methods struggle in extremely low-resource NMT settings, where training data is very limited. This paper contributes to artificial intelligence by proposing two approaches for adapting msLLMs in these challenging scenarios: (1) continual pre-training (CPT), where the msLLM is further trained with domain-specific monolingual data to compensate for the under-representation of LRLs, and (2) intermediate task transfer learning (ITTL), a method that fine-tunes the msLLM with both in-domain and out-of-domain parallel data to enhance its translation capabilities across various domains and tasks. As an application in engineering, these methods are implemented in NMT systems for Sinhala, Tamil, and English (six language pairs) in domain-specific, extremely low-resource settings (datasets containing fewer than 100,000 samples). Our experiments reveal that these approaches enhance translation performance by an average of +1.47 bilingual evaluation understudy (BLEU) score compared to the standard single-stage fine-tuning baseline across all translation directions. Additionally, a multi-model ensemble further improves performance by an additional BLEU score.
Problem

Research questions and friction points this paper is trying to address.

Improving low-resource machine translation with multistage methods
Enhancing multilingual models via domain-specific data adaptation
Boosting translation accuracy in extremely limited data scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multistage fine-tuning with continual pre-training
Multilingual transfer learning using intermediate tasks
Domain-specific adaptation for low-resource languages
🔎 Similar Papers
No similar papers found.