Unlocking Reasoning Capability on Machine Translation in Large Language Models

📅 2026-02-16

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

This work addresses the limitations of current reasoning-oriented large language models in machine translation, which suffer from suboptimal performance due to the use of generic reasoning strategies that lack task-specific structural guidance. To overcome this, the authors propose a structured reasoning framework tailored for machine translation, comprising multi-stage drafting, accuracy refinement, fluency enhancement, and selective iterative revision. This approach dynamically synthesizes reasoning trajectory data and leverages it for post-training. The study is the first to reveal the adverse impact of generic reasoning on translation quality and transcends the constraints of conventional linear reasoning paradigms. Evaluated on the WMT24++ benchmark, the proposed method significantly outperforms both standard fine-tuning and generic reasoning injection techniques, demonstrating the efficacy of task-customized structured reasoning in machine translation.

Technology Category

Application Category

📝 Abstract

Reasoning-oriented large language models (RLMs) achieve strong gains on tasks such as mathematics and coding by generating explicit intermediate reasoning. However, their impact on machine translation (MT) remains underexplored. We systematically evaluate several open- and closed-weights RLMs on the WMT24++ benchmark and find that enabling explicit reasoning consistently degrades translation quality across languages and models. Analysis reveals that MT reasoning traces are highly linear, lacking revision, self-correction and exploration of alternative translations, which limits their usefulness. Furthermore, injecting higher-quality reasoning traces from stronger models does not reliably improve weaker models'performance. To address this mismatch, we propose a structured reasoning framework tailored to translation, based on multi-step drafting, adequacy refinement, fluency improvement, and selective iterative revision. We curate a synthetic dataset of dynamic structured reasoning traces and post-train a large reasoning model on this data. Experiments show significant improvements over standard translation fine-tuning and injected generic reasoning baselines. Our findings demonstrate that reasoning must be task-structured to benefit MT.

Problem

Research questions and friction points this paper is trying to address.

machine translation

reasoning capability

large language models

reasoning traces

translation quality

Innovation

Methods, ideas, or system contributions that make the work stand out.

structured reasoning

machine translation

reasoning traces