🤖 AI Summary
This study addresses the gap in effective post-editing support despite the widespread availability of high-quality machine translation by introducing, for the first time, a large language model (LLM)-driven automatic post-editing (APE) system that highlights errors and provides correction suggestions directly into professional translators’ workflows. The approach is systematically compared against traditional quality estimation (QE) methods through human-in-the-loop experiments. While the LLM-based APE does not yield statistically significant improvements in productivity or final translation quality, its error highlighting is consistently preferred by professional translators, and its corrective suggestions substantially enhance user experience. These findings demonstrate a promising new direction for integrating LLMs into translation post-editing processes, emphasizing usability and human-centered design over purely output-oriented metrics.
📝 Abstract
As MT quality increases, interest in enhanced post-editing features such as QE-derived error highlights is growing, yet evidence for their usefulness remains limited. In this work, we explore the usefulness of LLM-derived error highlights and correction suggestions based on automatic post-editing (APE). We conduct a study where professional translators (En-Nl) post-edit translations using APE error highlights and correction suggestions and compare productivity, quality and user experience to regular PE and PE with QE-derived highlights. While no condition yielded productivity or quality gains compared to regular PE, APE highlights were better received than QE-derived highlights, and correction suggestions improved overall user experience.