Optimizing the Training Schedule of Multilingual NMT using Reinforcement Learning

📅 2024-10-08

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

To address the performance limitations of low-resource languages (LRLs) in multilingual neural machine translation (NMT), this paper proposes a reinforcement learning–based dynamic training schedule optimization method. The approach integrates Teacher-Student curriculum learning with a deep Q-network (DQN) to construct a state-aware neural reward model and introduces an exponential smoothing reward estimation mechanism grounded in development-set loss. Operating within an 8-to-1 multilingual translation setting, the method adaptively modulates the sampling frequency of high- and low-resource language batches during training. Experiments demonstrate statistically significant improvements in both BLEU and COMET scores over baseline strategies—namely, random monolingual batching and shuffled multilingual batching—confirming the tangible quality gains for LRL translation. This work establishes a scalable, sequential optimization paradigm for multilingual NMT training, advancing scheduling-aware learning in resource-imbalanced settings.

Technology Category

Application Category

📝 Abstract

Multilingual NMT is a viable solution for translating low-resource languages (LRLs) when data from high-resource languages (HRLs) from the same language family is available. However, the training schedule, i.e. the order of presentation of languages, has an impact on the quality of such systems. Here, in a many-to-one translation setting, we propose to apply two algorithms that use reinforcement learning to optimize the training schedule of NMT: (1) Teacher-Student Curriculum Learning and (2) Deep Q Network. The former uses an exponentially smoothed estimate of the returns of each action based on the loss on monolingual or multilingual development subsets, while the latter estimates rewards using an additional neural network trained from the history of actions selected in different states of the system, together with the rewards received. On a 8-to-1 translation dataset with LRLs and HRLs, our second method improves BLEU and COMET scores with respect to both random selection of monolingual batches and shuffled multilingual batches, by adjusting the number of presentations of LRL vs. HRL batches.

Problem

Research questions and friction points this paper is trying to address.

Optimizing training schedule for multilingual NMT using RL

Improving translation quality for low-resource languages

Enhancing BLEU and COMET scores via adaptive batch selection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning optimizes NMT training schedule

Teacher-Student Curriculum Learning uses smoothed returns

Deep Q Network estimates rewards via neural network

🔎 Similar Papers

Cross-lingual Human-Preference Alignment for Neural Machine Translation with Direct Quality Optimization