🤖 AI Summary
This study addresses the unresolved trade-off between fluency and fidelity in literary translation, particularly in outputs from large language models. Leveraging a multilingual corpus of 130,486 paragraphs from 106 novels, the authors present the first large-scale quantitative analysis of this relationship across human translations, Google Translate, and TranslateGemma. Fluency is measured using a part-of-speech n-gram–based translationese classifier, while fidelity is assessed via the COMET-KIWI metric, with paragraph length controlled as a covariate. Results reveal a consistent negative correlation between fluency and fidelity across all systems; however, this trade-off is pronounced for both human and Google Translate outputs but weak or statistically insignificant for TranslateGemma, suggesting that current models have yet to overcome a fundamental challenge in literary translation.
📝 Abstract
Literary translation requires balancing target-language fluency with faithfulness to the source. Recent large language models (LLMs) often produce fluent translations, but it remains unclear whether fluency corresponds to semantic preservation in literary text. We examine this relationship using 130,486 translated paragraphs from 106 novels in 16 source languages, including human, Google Translate, and TranslateGemma translations. Fluency is measured as original-likeness with a translationese classifier trained on paragraph part-of-speech n-grams, and faithfulness with the automatic translation evaluation metric COMET-KIWI. We control for paragraph length and find a consistent negative correlation between fluency and faithfulness. The pattern appears for both human and Google Translate, but is weaker and often non-significant for TranslateGemma. These results show that segment length matters for automatic evaluation and suggest a tradeoff between fluency and faithfulness in literary translation.