Can Peter Pan Survive MT? A Stylometric Study of LLMs, NMTs, and HTs in Children's Literature Translation

📅 2025-06-27

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

This study investigates stylistic appropriateness in English-to-Chinese children’s literature translation, comparing large language models (LLMs) and neural machine translation (NMT). It constructs the first human–LLM–NMT parallel corpus based on multiple translations of *Peter Pan*. Methodologically, it introduces a novel 447-dimensional stylistic feature set tailored to creative text translation (CTT), encompassing lexical, syntactic, readability, and n-gram features, and applies classification and clustering analyses to assess distributional differences across translation types regarding both general and children’s-literature-specific linguistic properties. Results demonstrate that LLM outputs significantly outperform NMT on key CTT dimensions—including repetitiveness, rhythmicity, conjunction and adverb usage, and monosyllabic word frequency ratio—and exhibit stylistic distributions markedly closer to human translations. These findings confirm LLMs’ superior stylistic generation capability and greater practical potential for children’s literature translation.

Technology Category

Application Category

📝 Abstract

This study focuses on evaluating the performance of machine translations (MTs) compared to human translations (HTs) in English-to-Chinese children's literature translation (CLT) from a stylometric perspective. The research constructs a Peter Pan corpus, comprising 21 translations: 7 human translations (HTs), 7 large language model translations (LLMs), and 7 neural machine translation outputs (NMTs). The analysis employs a generic feature set (including lexical, syntactic, readability, and n-gram features) and a creative text translation (CTT-specific) feature set, which captures repetition, rhythm, translatability, and miscellaneous levels, yielding 447 linguistic features in total. Using classification and clustering techniques in machine learning, we conduct a stylometric analysis of these translations. Results reveal that in generic features, HTs and MTs exhibit significant differences in conjunction word distributions and the ratio of 1-word-gram-YiYang, while NMTs and LLMs show significant variation in descriptive words usage and adverb ratios. Regarding CTT-specific features, LLMs outperform NMTs in distribution, aligning more closely with HTs in stylistic characteristics, demonstrating the potential of LLMs in CLT.

Problem

Research questions and friction points this paper is trying to address.

Evaluating machine vs human translations in children's literature

Comparing stylometric features of LLMs, NMTs, and HTs

Assessing LLMs' potential in creative text translation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Constructs Peter Pan corpus with HTs, LLMs, NMTs

Analyzes 447 linguistic features stylometrically

LLMs align closer to HTs in style

🔎 Similar Papers

No similar papers found.