🤖 AI Summary
This study investigates stylistic appropriateness in English-to-Chinese children’s literature translation, comparing large language models (LLMs) and neural machine translation (NMT). It constructs the first human–LLM–NMT parallel corpus based on multiple translations of *Peter Pan*. Methodologically, it introduces a novel 447-dimensional stylistic feature set tailored to creative text translation (CTT), encompassing lexical, syntactic, readability, and n-gram features, and applies classification and clustering analyses to assess distributional differences across translation types regarding both general and children’s-literature-specific linguistic properties. Results demonstrate that LLM outputs significantly outperform NMT on key CTT dimensions—including repetitiveness, rhythmicity, conjunction and adverb usage, and monosyllabic word frequency ratio—and exhibit stylistic distributions markedly closer to human translations. These findings confirm LLMs’ superior stylistic generation capability and greater practical potential for children’s literature translation.
📝 Abstract
This study focuses on evaluating the performance of machine translations (MTs) compared to human translations (HTs) in English-to-Chinese children's literature translation (CLT) from a stylometric perspective. The research constructs a Peter Pan corpus, comprising 21 translations: 7 human translations (HTs), 7 large language model translations (LLMs), and 7 neural machine translation outputs (NMTs). The analysis employs a generic feature set (including lexical, syntactic, readability, and n-gram features) and a creative text translation (CTT-specific) feature set, which captures repetition, rhythm, translatability, and miscellaneous levels, yielding 447 linguistic features in total.
Using classification and clustering techniques in machine learning, we conduct a stylometric analysis of these translations. Results reveal that in generic features, HTs and MTs exhibit significant differences in conjunction word distributions and the ratio of 1-word-gram-YiYang, while NMTs and LLMs show significant variation in descriptive words usage and adverb ratios. Regarding CTT-specific features, LLMs outperform NMTs in distribution, aligning more closely with HTs in stylistic characteristics, demonstrating the potential of LLMs in CLT.