MERIT: Multilingual Expert-Reward Informed Tuning for Chinese-Centric Low-Resource Machine Translation

📅 2026-04-06

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

Neural machine translation from Chinese to low-resource Southeast Asian languages has long been hindered by the scarcity of parallel corpora and high noise levels in mined data. This work addresses these challenges by constructing a Chinese-centric low-resource translation framework that integrates language-specific prefixes (LSP), supervised fine-tuning (SFT), and a group relative policy optimization (GRPO) method augmented with semantic alignment rewards (SAR), thereby overcoming limitations imposed by reliance solely on model scale. The study further adapts the ALT benchmark into a Chinese-centered evaluation suite and demonstrates substantial improvements over existing large models on languages such as Lao, Burmese, and Tagalog, validating the efficacy of high-quality data filtering and reward-guided optimization.

Technology Category

Application Category

📝 Abstract

Neural machine translation (NMT) from Chinese to low-resource Southeast Asian languages remains severely constrained by the extreme scarcity of clean parallel corpora and the pervasive noise in existing mined data. This chronic shortage not only impedes effective model training but also sustains a large performance gap with high-resource directions, leaving millions of speakers of languages such as Lao, Burmese, and Tagalog with persistently low-quality translation systems despite recent advances in large multilingual models. We introduce \textbf{M}ultilingual \textbf{E}xpert-\textbf{R}eward \textbf{I}nformed \textbf{T}uning (\textbf{MERIT}), a unified translation framework that transforms the traditional English-centric ALT benchmark into a Chinese-centric evaluation suite for five Southeast Asian low-resource languages (LRLs). Our framework combines language-specific token prefixing (LTP) with supervised fine-tuning (SFT) and a novel group relative policy optimization (GRPO) guided by the semantic alignment reward (SAR). These results confirm that, in LRL{\textrightarrow}Chinese translation, targeted data curation and reward-guided optimization dramatically outperform mere model scaling.

Problem

Research questions and friction points this paper is trying to address.

low-resource machine translation

Chinese-centric translation

parallel corpora scarcity

noisy mined data

Southeast Asian languages

Innovation

Methods, ideas, or system contributions that make the work stand out.

MERIT

Chinese-centric translation

low-resource languages