π€ AI Summary
This study investigates the self-iterative refinement mechanisms of document-level large language models (LLMs) in literary translation, where such processes remain poorly understood. The authors systematically evaluate nine LLMs across seven language pairs, examining various multi-granularity translation-and-refinement configurations and prompting strategies through large-scale human assessment and model strength perturbation experiments. Findings indicate that document-level translation followed by segment-level refinement yields robust performance. Generic prompting consistently outperforms error-focused or evaluate-then-refine approaches. Refinement primarily enhances fluency, stylistic coherence, and terminological consistency, with limited gains in fidelity. Moreover, the refinement process tends to align outputs with the optimizerβs own distribution rather than accurately correcting errors.
π Abstract
Iterative self-refinement is a simple inference-time strategy for machine translation: an LLM revises its own translation over multiple inference-time passes. Yet document-scale refinement remains poorly understood: 1) which pipelines work best, 2) what quality dimensions improve, and 3) how refiners behave. In this paper, we present a systematic study of document-level literary translation, covering nine LLMs and seven language pairs. Across nine translation-refinement granularity combinations and five refinement strategies, we find a robust recipe: document-level MT followed by segment-level refinement yields strong and stable improvements. In contrast, document-level refinement often makes fewer edits and leads to smaller or less reliable gains. Beyond granularity, A simple general refinement prompt consistently outperforms error-specific prompting and evaluate-then-refine schemes. Our large-scale human evaluation shows that refinement gains come primarily from fluency, style, and terminology, with limited and less consistent improvements in adequacy. Experiments varying model strength reveal refinement projects outputs toward the refiner's distribution rather than performing targeted error repair. These findings clarify the mechanisms and limitations of current refinement approaches.