🤖 AI Summary
This work addresses the high computational overhead and latency of existing large reasoning models in machine translation, which rely on a multi-step “think-then-translate” paradigm to improve output quality. To overcome this limitation, the authors propose ReflectMT, a novel two-stage reflection internalization algorithm that reverses the process to “translate-then-think.” By leveraging reinforcement learning, ReflectMT internalizes the model’s capacity for self-reflection and refinement, enabling it to directly generate high-quality initial translations during inference without explicit intermediate reasoning steps. Evaluated on benchmarks including WMT24, ReflectMT significantly outperforms state-of-the-art multi-step reasoning models such as DeepSeek-R1, achieving a 2.16-point improvement in GPT-based evaluation scores while reducing token consumption by 94.33%, thereby delivering both efficiency and performance gains.
📝 Abstract
Recent years have witnessed growing interest in applying Large Reasoning Models (LRMs) to Machine Translation (MT). Existing approaches predominantly adopt a "think-first-then-translate" paradigm. Although explicit reasoning trajectories significantly enhance translation quality, they incur prohibitive inference costs and latency. To address these limitations, we propose ReflectMT, a two-stage reflection internalization algorithm for machine translation that employs a "translate-first-think-later" paradigm. Our approach develops the model's "translate-reflect-refine" capability through reinforcement learning. In the first stage, we cultivate the model's capacity for high-quality reflection and refinement, thereby enhancing its semantic comprehension and task-specific knowledge. In the second stage, we train the model to internalize the knowledge acquired during reflection. As a result, during inference, ReflectMT operates in a direct translation mode, producing high-quality translations on the first attempt without any explicit reasoning steps. Experimental results on datasets such as WMT24 demonstrate that our model's first-pass translations during inference outperform multi-step reasoning LRMs such as DeepSeek-R1 in both automatic metrics and GPT-based evaluation, achieving a 2.16-point improvement in GPT-based translation quality evaluation while reducing token consumption by 94.33%.