ReflectMT: Internalizing Reflection for Efficient and High-Quality Machine Translation

📅 2026-04-21
📈 Citations: 0
Influential: 0
📄 PDF

career value

188K/year
🤖 AI Summary
This work addresses the high computational overhead and latency of existing large reasoning models in machine translation, which rely on a multi-step “think-then-translate” paradigm to improve output quality. To overcome this limitation, the authors propose ReflectMT, a novel two-stage reflection internalization algorithm that reverses the process to “translate-then-think.” By leveraging reinforcement learning, ReflectMT internalizes the model’s capacity for self-reflection and refinement, enabling it to directly generate high-quality initial translations during inference without explicit intermediate reasoning steps. Evaluated on benchmarks including WMT24, ReflectMT significantly outperforms state-of-the-art multi-step reasoning models such as DeepSeek-R1, achieving a 2.16-point improvement in GPT-based evaluation scores while reducing token consumption by 94.33%, thereby delivering both efficiency and performance gains.

Technology Category

Application Category

📝 Abstract
Recent years have witnessed growing interest in applying Large Reasoning Models (LRMs) to Machine Translation (MT). Existing approaches predominantly adopt a "think-first-then-translate" paradigm. Although explicit reasoning trajectories significantly enhance translation quality, they incur prohibitive inference costs and latency. To address these limitations, we propose ReflectMT, a two-stage reflection internalization algorithm for machine translation that employs a "translate-first-think-later" paradigm. Our approach develops the model's "translate-reflect-refine" capability through reinforcement learning. In the first stage, we cultivate the model's capacity for high-quality reflection and refinement, thereby enhancing its semantic comprehension and task-specific knowledge. In the second stage, we train the model to internalize the knowledge acquired during reflection. As a result, during inference, ReflectMT operates in a direct translation mode, producing high-quality translations on the first attempt without any explicit reasoning steps. Experimental results on datasets such as WMT24 demonstrate that our model's first-pass translations during inference outperform multi-step reasoning LRMs such as DeepSeek-R1 in both automatic metrics and GPT-based evaluation, achieving a 2.16-point improvement in GPT-based translation quality evaluation while reducing token consumption by 94.33%.
Problem

Research questions and friction points this paper is trying to address.

Machine Translation
Large Reasoning Models
Inference Efficiency
Translation Latency
Explicit Reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

reflection internalization
machine translation
large reasoning models
reinforcement learning
efficient inference
🔎 Similar Papers
No similar papers found.