Beyond Vanilla Fine-Tuning: Leveraging Multistage, Multilingual, and Domain-Specific Methods for Low-Resource Machine Translation

📅 2025-03-28

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

For neural machine translation (NMT) into ultra-low-resource languages—such as Sinhala and Tamil—where parallel training data is severely limited (<100K sentence pairs), standard fine-tuning yields suboptimal performance. To address this, we propose a multi-stage adaptation framework: (1) domain-adaptive continual pretraining (CPT) on monolingual target-language corpora; (2) intermediate-task transfer learning (ITTL) leveraging cross-domain parallel data; and (3) ensemble inference integrating outputs from multiple adapted models. Our work introduces the novel “domain-specific monolingual CPT + cross-domain ITTL” synergistic paradigm, effectively alleviating the data bottleneck inherent in conventional fine-tuning. Experiments across six Sinhala–Tamil–English translation directions demonstrate an average BLEU improvement of +1.47 over strong fine-tuning baselines; ensemble integration further amplifies gains, consistently outperforming standard approaches.

Technology Category

Application Category

📝 Abstract

Fine-tuning multilingual sequence-to-sequence large language models (msLLMs) has shown promise in developing neural machine translation (NMT) systems for low-resource languages (LRLs). However, conventional single-stage fine-tuning methods struggle in extremely low-resource NMT settings, where training data is very limited. This paper contributes to artificial intelligence by proposing two approaches for adapting msLLMs in these challenging scenarios: (1) continual pre-training (CPT), where the msLLM is further trained with domain-specific monolingual data to compensate for the under-representation of LRLs, and (2) intermediate task transfer learning (ITTL), a method that fine-tunes the msLLM with both in-domain and out-of-domain parallel data to enhance its translation capabilities across various domains and tasks. As an application in engineering, these methods are implemented in NMT systems for Sinhala, Tamil, and English (six language pairs) in domain-specific, extremely low-resource settings (datasets containing fewer than 100,000 samples). Our experiments reveal that these approaches enhance translation performance by an average of +1.47 bilingual evaluation understudy (BLEU) score compared to the standard single-stage fine-tuning baseline across all translation directions. Additionally, a multi-model ensemble further improves performance by an additional BLEU score.

Problem

Research questions and friction points this paper is trying to address.

Improving low-resource machine translation with multistage methods

Enhancing multilingual models via domain-specific data adaptation

Boosting translation accuracy in extremely limited data scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multistage fine-tuning with continual pre-training

Multilingual transfer learning using intermediate tasks

Domain-specific adaptation for low-resource languages

🔎 Similar Papers

No similar papers found.