Feeding Two Birds or Favoring One? Adequacy-Fluency Tradeoffs in Evaluation and Meta-Evaluation of Machine Translation

📅 2025-09-24

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

This paper investigates how the adequacy–fluency trade-off in machine translation (MT) evaluation affects automatic metric performance and meta-evaluation outcomes. Through correlation analysis and a meta-evaluation framework applied to WMT data, we find that mainstream automatic metrics exhibit a systematic adequacy bias, and their meta-evaluation rankings are highly sensitive to the composition of participating MT systems—a previously under-addressed source of compositional bias. To address this, we propose a novel synthetic-system-based meta-evaluation control method: by controllably generating diverse system combinations with calibrated adequacy–fluency profiles, we mitigate composition-induced distortions. Experiments demonstrate that our approach significantly improves the fairness and robustness of metric rankings. Crucially, this work is the first to identify and correct the adequacy–fluency trade-off bias at the meta-evaluation level, providing both theoretical insights and practical guidelines for building more balanced and trustworthy MT evaluation frameworks.

Technology Category

Application Category

📝 Abstract

We investigate the tradeoff between adequacy and fluency in machine translation. We show the severity of this tradeoff at the evaluation level and analyze where popular metrics fall within it. Essentially, current metrics generally lean toward adequacy, meaning that their scores correlate more strongly with the adequacy of translations than with fluency. More importantly, we find that this tradeoff also persists at the meta-evaluation level, and that the standard WMT meta-evaluation favors adequacy-oriented metrics over fluency-oriented ones. We show that this bias is partially attributed to the composition of the systems included in the meta-evaluation datasets. To control this bias, we propose a method that synthesizes translation systems in meta-evaluation. Our findings highlight the importance of understanding this tradeoff in meta-evaluation and its impact on metric rankings.

Problem

Research questions and friction points this paper is trying to address.

Investigating adequacy-fluency tradeoffs in machine translation evaluation

Analyzing bias in standard meta-evaluation favoring adequacy-oriented metrics

Proposing synthetic system method to control meta-evaluation bias

Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzes adequacy-fluency tradeoff in metrics

Proposes synthetic system method for meta-evaluation

Controls bias in metric ranking evaluations

🔎 Similar Papers

Large Language Models for Classical Chinese Poetry Translation: Benchmarking, Evaluating, and Improving