🤖 AI Summary
Machine translation (MT) quality evaluation traditionally relies on bilingual assessments, which are costly and less reflective of real-world monolingual user scenarios. Method: This study proposes and empirically validates a context-aware monolingual human evaluation paradigm. We design a context-enhanced prompting framework wherein professional translators assign quality scores and annotate errors in the target language only, supplemented by qualitative feedback and statistical testing (p < 0.05) against bilingual baselines. Contribution/Results: Monolingual evaluation achieves high agreement with bilingual evaluation across system-level scoring, pairwise comparisons, and error-type distributions. It improves evaluation efficiency by ~40% while better approximating authentic user conditions. This work provides the first empirical validation of high-fidelity monolingual MT evaluation, establishing a scalable, cost-effective, and practically viable methodology for MT quality assessment.
📝 Abstract
This paper explores the potential of context-aware monolingual human evaluation for assessing machine translation (MT) when no source is given for reference. To this end, we compare monolingual with bilingual evaluations (with source text), under two scenarios: the evaluation of a single MT system, and the comparative evaluation of pairwise MT systems. Four professional translators performed both monolingual and bilingual evaluations by assigning ratings and annotating errors, and providing feedback on their experience. Our findings suggest that context-aware monolingual human evaluation achieves comparable outcomes to human bilingual evaluations, and suggest the feasibility and potential of monolingual evaluation as an efficient approach to assessing MT.