Alleviating Distribution Shift in Synthetic Data for Machine Translation Quality Estimation

📅 2025-02-27

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

Synthetic data in Machine Translation Quality Estimation (QE) suffers from distributional shift—mismatch between pseudo-translations and authentic translations, and misalignment between pseudo-labels and human preferences. Method: We propose ADSQE, a novel framework that integrates constrained beam search, multi-model collaborative generation, reference-guided word-level annotation, and error-propagation-based phrase-level label inference. Crucially, it prohibits translation models from self-evaluating their outputs to avoid circular bias. Contribution/Results: ADSQE is the first to leverage reference translations to guide both synthetic data generation and fine-grained annotation, and introduces a shortest-error-phrase identification mechanism aligned with human annotator behavior. Experiments demonstrate that ADSQE consistently outperforms state-of-the-art methods—including COMET—on both supervised and unsupervised QE benchmarks. Moreover, it significantly enhances the efficacy of synthetic data for reward model training.

Technology Category

Application Category

📝 Abstract

Quality Estimation (QE) models evaluate the quality of machine translations without reference translations, serving as the reward models for the translation task. Due to the data scarcity, synthetic data generation has emerged as a promising solution. However, synthetic QE data often suffers from distribution shift, which can manifest as discrepancies between pseudo and real translations, or in pseudo labels that do not align with human preferences. To tackle this issue, we introduce ADSQE, a novel framework for alleviating distribution shift in synthetic QE data. To reduce the difference between pseudo and real translations, we employ the constrained beam search algorithm and enhance translation diversity through the use of distinct generation models. ADSQE uses references, i.e., translation supervision signals, to guide both the generation and annotation processes, enhancing the quality of word-level labels. ADSE further identifies the shortest phrase covering consecutive error tokens, mimicking human annotation behavior, to assign the final phrase-level labels. Specially, we underscore that the translation model can not annotate translations of itself accurately. Extensive experiments demonstrate that ADSQE outperforms SOTA baselines like COMET in both supervised and unsupervised settings. Further analysis offers insights into synthetic data generation that could benefit reward models for other tasks.

Problem

Research questions and friction points this paper is trying to address.

Alleviating distribution shift in synthetic data

Improving machine translation quality estimation

Enhancing synthetic data generation and annotation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Constrained beam search algorithm

Translation supervision signal guidance

Shortest phrase error identification

🔎 Similar Papers

How to Learn in a Noisy World? Self-Correcting the Real-World Data Noise on Machine Translation