CLewR: Curriculum Learning with Restarts for Machine Translation Preference Learning

📅 2026-01-09
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing research indicates that preference optimization in machine translation is prone to catastrophic forgetting of easy samples due to suboptimal training data ordering, which degrades overall performance. To address this issue, this work proposes CLewR, a curriculum learning strategy with a restart mechanism that, for the first time, systematically incorporates an easy-to-hard, multi-cycle data scheduling approach into preference optimization. This method effectively mitigates catastrophic forgetting and seamlessly integrates with mainstream preference optimization algorithms such as DPO and IPO. Consistent performance improvements are demonstrated across multiple large language models, including Gemma2, Qwen2.5, and Llama3.1. The implementation code has been made publicly available.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) have demonstrated competitive performance in zero-shot multilingual machine translation (MT). Some follow-up works further improved MT performance via preference optimization, but they leave a key aspect largely underexplored: the order in which data samples are given during training. We address this topic by integrating curriculum learning into various state-of-the-art preference optimization algorithms to boost MT performance. We introduce a novel curriculum learning strategy with restarts (CLewR), which reiterates easy-to-hard curriculum multiple times during training to effectively mitigate the catastrophic forgetting of easy examples. We demonstrate consistent gains across several model families (Gemma2, Qwen2.5, Llama3.1) and preference optimization techniques. We publicly release our code at https://github.com/alexandra-dragomir/CLewR.
Problem

Research questions and friction points this paper is trying to address.

machine translation
preference learning
curriculum learning
data ordering
catastrophic forgetting
Innovation

Methods, ideas, or system contributions that make the work stand out.

Curriculum Learning
Preference Optimization
Machine Translation
Catastrophic Forgetting
Training Strategy
🔎 Similar Papers
No similar papers found.
A
Alexandra Dragomir
Bitdefender, Bucharest, Romania
Florin Brad
Florin Brad
Bitdefender
Natural Language ProcessingNeural Networks
R
R. Ionescu
Department of Computer Science, University of Bucharest, Romania