Multi-perspective Alignment for Increasing Naturalness in Neural Machine Translation

📅 2024-12-11

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

Neural machine translation (NMT) often suffers from lexical bias in training data, leading to impoverished, unnatural outputs that deviate from human linguistic conventions—limiting its applicability in high-stakes scenarios such as high-quality evaluation dataset construction. Existing approaches enhancing naturalness frequently compromise content fidelity. To address this, we propose a multi-perspective alignment-driven dual-objective optimization framework. Our method introduces a novel reinforcement learning reward mechanism jointly optimizing for naturalness and content preservation, integrated with coordinated alignment strategies across lexical, syntactic, and semantic levels, and end-to-end optimized via human feedback (RLHF). Evaluated on English–Dutch literary translation, our model achieves state-of-the-art naturalness—significantly improving lexical diversity and human-like stylistic features—while maintaining perfect translation accuracy (zero accuracy loss). This work represents the first systematic mitigation of “translationese” without sacrificing content fidelity.

Technology Category

Application Category

📝 Abstract

Neural machine translation (NMT) systems amplify lexical biases present in their training data, leading to artificially impoverished language in output translations. These language-level characteristics render automatic translations different from text originally written in a language and human translations, which hinders their usefulness in for example creating evaluation datasets. Attempts to increase naturalness in NMT can fall short in terms of content preservation, where increased lexical diversity comes at the cost of translation accuracy. Inspired by the reinforcement learning from human feedback framework, we introduce a novel method that rewards both naturalness and content preservation. We experiment with multiple perspectives to produce more natural translations, aiming at reducing machine and human translationese. We evaluate our method on English-to-Dutch literary translation, and find that our best model produces translations that are lexically richer and exhibit more properties of human-written language, without loss in translation accuracy.

Problem

Research questions and friction points this paper is trying to address.

Reducing lexical biases in NMT outputs

Balancing naturalness and content preservation

Mitigating machine translationese for human-like quality

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-perspective alignment for natural NMT

Reinforcement learning rewards naturalness and accuracy

Lexically richer translations without accuracy loss

🔎 Similar Papers

Cross-lingual Human-Preference Alignment for Neural Machine Translation with Direct Quality Optimization