Cross-lingual Human-Preference Alignment for Neural Machine Translation with Direct Quality Optimization

📅 2024-09-26
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Multilingual neural machine translation (MNMT) suffers from task–data mismatch, leading to degraded performance on low-resource languages. To address this, we propose Direct Quality Optimization (DQO), the first method to integrate pretrained translation quality estimation models (e.g., COMET) as differentiable human preference proxies into the Direct Preference Optimization (DPO) framework—enabling cross-lingual preference alignment without human annotations. DQO imposes alignment constraints on only a subset of language pairs yet generalizes to all language directions. It is fully compatible with standard MNMT architectures and avoids the complexity of reinforcement learning. Extensive experiments across multiple benchmarks demonstrate significant improvements in both BLEU and COMET scores; human evaluation further confirms substantial gains in translation faithfulness, consistency, and low-resource language performance.

Technology Category

Application Category

📝 Abstract
Reinforcement Learning from Human Feedback (RLHF) and derivative techniques like Direct Preference Optimization (DPO) are task-alignment algorithms used to repurpose general, foundational models for specific tasks. We show that applying task-alignment to neural machine translation (NMT) addresses an existing task--data mismatch in NMT, leading to improvements across all languages of a multilingual model, even when task-alignment is only applied to a subset of those languages. We do so by introducing Direct Quality Optimization (DQO), a variant of DPO leveraging a pre-trained translation quality estimation model as a proxy for human preferences, and verify the improvements with both automatic metrics and human evaluation.
Problem

Research questions and friction points this paper is trying to address.

Aligns human preferences in cross-lingual neural machine translation
Addresses task-data mismatch in multilingual NMT models
Introduces DQO for optimizing translation quality using human feedback
Innovation

Methods, ideas, or system contributions that make the work stand out.

Direct Quality Optimization for NMT
Leverage pre-trained quality estimation
Improve multilingual model alignment
🔎 Similar Papers
No similar papers found.
K
Kaden Uhlig
LILT
Joern Wuebker
Joern Wuebker
Lilt
Machine Translation
R
Raphael Reinauer
LILT
John DeNero
John DeNero
LILT