Lost in Literalism: How Supervised Training Shapes Translationese in LLMs

📅 2025-03-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper identifies supervised fine-tuning (SFT) as the primary source of translationese—characterized by overly literal and unnatural output—in large language model (LLM)-based machine translation. Through systematic human evaluation and automatic metrics (BLEU, COMET, BERTScore), we empirically demonstrate for the first time that translationese stems from data bias inherent in the SFT stage. To address this, we propose two novel strategies: (1) reference polishing, which enhances target-language fluency by refining reference translations, and (2) unnatural sample filtering, which removes low-quality, non-fluent instances from the training set. Experiments show that our approach significantly reduces translationese across multiple automatic metrics and human evaluations, improving both naturalness and target-language consistency. The method establishes a reproducible data curation and training paradigm for controllable optimization of LLM-based translation systems.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) have achieved remarkable success in machine translation, demonstrating impressive performance across diverse languages. However, translationese, characterized by overly literal and unnatural translations, remains a persistent challenge in LLM-based translation systems. Despite their pre-training on vast corpora of natural utterances, LLMs exhibit translationese errors and generate unexpected unnatural translations, stemming from biases introduced during supervised fine-tuning (SFT). In this work, we systematically evaluate the prevalence of translationese in LLM-generated translations and investigate its roots during supervised training. We introduce methods to mitigate these biases, including polishing golden references and filtering unnatural training instances. Empirical evaluations demonstrate that these approaches significantly reduce translationese while improving translation naturalness, validated by human evaluations and automatic metrics. Our findings highlight the need for training-aware adjustments to optimize LLM translation outputs, paving the way for more fluent and target-language-consistent translations. We release the data and code at https://github.com/yafuly/LLM_Translationese.
Problem

Research questions and friction points this paper is trying to address.

Addresses translationese in LLM-based translation systems
Investigates biases from supervised fine-tuning causing unnatural translations
Proposes methods to reduce translationese and improve translation naturalness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Polishing golden references to reduce translationese
Filtering unnatural training instances for better naturalness
Training-aware adjustments for fluent target-language translations
🔎 Similar Papers
No similar papers found.