Deep Reasoning Translation via Reinforcement Learning

📅 2025-04-14

📈 Citations: 0

✨ Influential: 0

career value

162K/year

🤖 AI Summary

Existing deep-reasoning large language models (LLMs) lack joint modeling capabilities for cultural adaptation and reasoning processes in free translation—a culturally sensitive multilingual task. Method: We propose DeepTrans, a zero-annotation deep-reasoning translation framework featuring a novel two-tier reward mechanism that simultaneously optimizes both translation outputs and reasoning chains, incorporating cultural appropriateness, logical coherence, and semantic fidelity. Built upon Qwen2.5-7B, it employs a reinforcement learning (RL) training framework augmented with chain-of-thought guidance strategies. Contribution/Results: On literary translation benchmarks, DeepTrans achieves a 16.3% improvement over strong baselines—including OpenAI o1, DeepSeek-R1, and synthetic-data fine-tuning methods—demonstrating significant progress in cross-cultural generalization. Furthermore, our analysis uncovers critical failure modes in RL training, offering insights into robustness limitations and optimization challenges in culturally grounded translation.

Technology Category

Application Category

📝 Abstract

Recently, deep reasoning LLMs (e.g., OpenAI o1/o3 and DeepSeek-R1) have shown promising performance in various complex tasks. Free translation is an important and interesting task in the multilingual world, which requires going beyond word-for-word translation and taking cultural differences into account. This task is still under-explored in deep reasoning LLMs. In this paper, we introduce DeepTrans, a deep reasoning translation model that learns free translation via reinforcement learning. Specifically, we carefully build a reward model with pre-defined scoring criteria on both the translation results and the thought process. Given the source sentences, the reward model teaches the deep translation model how to think and free-translate them during reinforcement learning. In this way, training DeepTrans does not need any labeled translations, avoiding the human-intensive annotation or resource-intensive data synthesis. Experimental results show the effectiveness of DeepTrans. Using Qwen2.5-7B as the backbone, DeepTrans improves performance by 16.3% in literature translation, and outperforms strong deep reasoning baselines as well as baselines that are fine-tuned with synthesized data. Moreover, we summarize the failures and interesting findings during our RL exploration. We hope this work could inspire other researchers in free translation.

Problem

Research questions and friction points this paper is trying to address.

Exploring free translation in deep reasoning LLMs

Developing DeepTrans via reinforcement learning

Improving translation without labeled data

Innovation

Methods, ideas, or system contributions that make the work stand out.

DeepTrans uses reinforcement learning for translation

Reward model guides translation and thought process

Training avoids labeled data, reducing human effort

🔎 Similar Papers

Large Language Models Are Cross-Lingual Knowledge-Free Reasoners