LLM Reasoning for Machine Translation: Synthetic Data Generation over Thinking Tokens

📅 2025-10-13

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

This work investigates whether introducing intermediate “reasoning tokens” in large reasoning models (LRMs) improves machine translation performance. Addressing the limitation of existing approaches—which mechanically imitate chain-of-thought reasoning without capturing translation-specific semantics—we propose a translation-aware intermediate representation generation paradigm. Specifically, we synthesize chain-of-thought data containing concrete translation attempts using LRMs, then employ modular prompting and knowledge distillation to train models across multilingual and multi-resource settings. Experimental results show that merely appending abstract reasoning tokens yields no gains; however, when intermediate representations explicitly encode translation steps—such as alignment, transduction, and post-editing—BLEU scores improve by 2.1–4.3 points on average. Our key contribution is demonstrating that modeling the translation process itself is more effective than mimicking generic reasoning paths, and that data-driven optimization of intermediate representations outperforms structured reasoning prompts in translation quality.

Technology Category

Application Category

📝 Abstract

Large reasoning models (LRMs) have led to new possibilities in terms of problem-solving, through the devising of a natural language thought process prior to answering a query. While their capabilities are well known across mathematics and coding tasks, their impact on the task of machine translation (MT) remains underexplored. In this work, we explore the benefits of the generation of intermediate tokens when performing MT across multiple language pairs of different levels of resourcedness and multiple setups. We find that "thinking tokens" do not help LRMs better perform MT. This result generalizes to models fine-tuned to reason before translating using distilled chain of thought (CoT) inspired by human translators' practices. Specifically, fine-tuning a model with synthetic CoT explanations detailing how to translate step-by-step does not outperform standard input-output fine-tuning. However, constructing the intermediate tokens by combining the outputs of modular translation-specific prompting strategies results in improvements. Our findings underscore that the contribution of intermediate tokens during fine-tuning highly depends on the presence of translation attempts within them. More broadly, our results suggest that using a teacher to refine target translations or to expand parallel corpora is more impactful than distilling their CoT explanations into "thinking" MT models.

Problem

Research questions and friction points this paper is trying to address.

Exploring intermediate reasoning tokens for machine translation performance improvement

Evaluating synthetic chain-of-thought fine-tuning across diverse language pairs

Assessing impact of translation attempts within thinking tokens on results

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generating intermediate thinking tokens for translation

Fine-tuning models with synthetic chain-of-thought explanations

Combining modular translation-specific prompting strategies

🔎 Similar Papers

No similar papers found.

TikTok

San Jose, California

Research Scientist in Large Language Model (LLM)-Seed

ByteDance

圣何塞

AI Research Scientist, VLM (vision language models)