Unveiling the Power of Source: Source-based Minimum Bayes Risk Decoding for Neural Machine Translation

📅 2024-06-17

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

To address the misalignment between maximum a posteriori (MAP) decoding and actual translation quality in neural machine translation (NMT), this paper proposes source-side Minimum Bayes Risk (sMBR) decoding. Unlike conventional MBR, sMBR constructs the utility function entirely from source-side signals—specifically, paraphrased or back-translated pseudo-sources—and a lightweight reference-free quality estimation model, thereby optimizing expected translation quality without requiring target-side candidate translations or human references. Key technical components include source-side paraphrasing/back-translation, efficient reference-free quality scoring, Monte Carlo sampling, and integration into an MBR decision framework. Evaluated on multilingual NMT benchmarks, sMBR consistently outperforms standard MBR decoding and quality estimation (QE)-based re-ranking across BLEU, COMET, and TER metrics. These results empirically validate the strong discriminative power and practical efficacy of source-side signals during decoding.

Technology Category

Application Category

📝 Abstract

Maximum a posteriori decoding, a commonly used method for neural machine translation (NMT), aims to maximize the estimated posterior probability. However, high estimated probability does not always lead to high translation quality. Minimum Bayes Risk (MBR) decoding (citealp{kumar2004minimum}) offers an alternative by seeking hypotheses with the highest expected utility. Inspired by Quality Estimation (QE) reranking which uses the QE model as a ranker (citealp{fernandes-etal-2022-quality}), we propose source-based MBR (sMBR) decoding, a novel approach that utilizes quasi-sources (generated via paraphrasing or back-translation) as ``support hypotheses'' and a reference-free quality estimation metric as the utility function, marking the first work to solely use sources in MBR decoding. Experiments show that sMBR outperforms QE reranking and the standard MBR decoding. Our findings suggest that sMBR is a promising approach for NMT decoding.

Problem

Research questions and friction points this paper is trying to address.

Improves Neural Machine Translation quality

Utilizes source-based MBR decoding

Employs reference-free quality estimation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Source-based MBR decoding

Quasi-sources as hypotheses

Reference-free quality estimation

🔎 Similar Papers

No similar papers found.