ExTrans: Multilingual Deep Reasoning Translation via Exemplar-Enhanced Reinforcement Learning

📅 2025-05-19

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

This work addresses two key challenges: (1) the underutilization of large reasoning models (LRMs) in low-resource language machine translation, and (2) the limited capacity of existing reward modeling approaches to fully harness reinforcement learning (RL) potential. To this end, we propose an example-augmented multilingual RL framework. Our method innovatively employs DeepSeek-R1-671B as a dynamic, fine-grained evaluator and introduces a novel reward modeling technique based on strong LRM contrastive scoring. Translation policies are optimized via Proximal Policy Optimization (PPO), leveraging knowledge distillation from Qwen2.5-7B-Instruct and subsequent multilingual adaptation fine-tuning—enabling efficient, lightweight transfer of high-quality unidirectional translation capability to 90 translation directions. Experiments demonstrate new state-of-the-art performance on literary translation, surpassing both OpenAI-o1 and DeepSeek-R1; the framework supports 11 languages and significantly improves translation quality for low-resource languages while enhancing multilingual generalization.

Technology Category

Application Category

📝 Abstract

In recent years, the emergence of large reasoning models (LRMs), such as OpenAI-o1 and DeepSeek-R1, has shown impressive capabilities in complex problems, e.g., mathematics and coding. Some pioneering studies attempt to bring the success of LRMs in neural machine translation (MT). They try to build LRMs with deep reasoning MT ability via reinforcement learning (RL). Despite some progress that has been made, these attempts generally focus on several high-resource languages, e.g., English and Chinese, leaving the performance on other languages unclear. Besides, the reward modeling methods in previous work do not fully unleash the potential of reinforcement learning in MT. In this work, we first design a new reward modeling method that compares the translation results of the policy MT model with a strong LRM (i.e., DeepSeek-R1-671B), and quantifies the comparisons to provide rewards. Experimental results demonstrate the superiority of the reward modeling method. Using Qwen2.5-7B-Instruct as the backbone, the trained model achieves the new state-of-the-art performance in literary translation, and outperforms strong LRMs including OpenAI-o1 and DeepSeeK-R1. Furthermore, we extend our method to the multilingual settings with 11 languages. With a carefully designed lightweight reward modeling in RL, we can simply transfer the strong MT ability from a single direction into multiple (i.e., 90) translation directions and achieve impressive multilingual MT performance.

Problem

Research questions and friction points this paper is trying to address.

Enhancing multilingual translation via exemplar-enhanced reinforcement learning

Improving reward modeling for better reinforcement learning in machine translation

Extending strong translation ability to multiple languages efficiently

Innovation

Methods, ideas, or system contributions that make the work stand out.

Exemplar-enhanced reinforcement learning for translation

New reward modeling with LRM comparison

Lightweight reward modeling for multilingual transfer

🔎 Similar Papers

No similar papers found.