Narrative Flattening: How Post-Training Compresses Thematic, Affective, and Stylistic Variation in LLM Fiction

📅 2026-05-26

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

This study addresses the pervasive issue of narrative flattening in novels generated by large language models, a phenomenon characterized by diminished narrative depth and structural homogeneity whose origins and cross-domain implications remain poorly understood. The work proposes the concept of “narrative flattening” and presents the first systematic quantification of how post-training compresses narrative dynamism. Leveraging a unified OLMo-32B model series (Base/SFT/DPO/RLVR) alongside matched human benchmarks, controlled experiments are conducted across three distinct domains—StoryStar, TMAS, and The New Yorker—evaluated via sentence-level topic shifts, emotional distribution, and linguistic diversity metrics. Results reveal that post-training significantly reduces thematic transition diversity, attenuates high-intensity emotional expression, and diminishes stylistic variation, with professional literary writing experiencing the strongest compression and model outputs converging toward cross-domain homogenization.

📝 Abstract

Large language models produce fluent fiction, yet their creative output is widely seen as flat. We ask where this quality originates in the training and whether it affects different domains of human fiction equally. We construct a matched story-continuation paradigm across StoryStar (public-platform), TMAS (prompt-guided), and The New Yorker (professional literary)-and compare continuations from four OLMo 32B checkpoints (Base, SFT, DPO, RLVR) against matched human text. Because these checkpoints share architecture, scale, tokenizer, and pretraining, the design isolates the post-training effect. We measure each continuation along three sentence-level dimensions: thematic motion, affective prevalence, and linguistic diversity. Across all three, post-training compresses dynamic variation: thematic transitions become more uniform, high-intensity emotions give way to neutrality, and stylistic diversity across stories shrinks. We term this progressive loss narrative flattening. The effect is directionally stable across story domains but gap size depends on the human baseline: professional literary fiction is compressed most, while public-platform and prompt-guided stories show smaller gaps, consistent with their human baselines sitting closer to the model's default rhythm. Post-trained endpoints converge across domains, suggesting alignment produces a continuation regime largely insensitive to the source domain's narrative texture.

Problem

Research questions and friction points this paper is trying to address.

narrative flattening

large language models

fiction generation

post-training

stylistic variation

Innovation

Methods, ideas, or system contributions that make the work stand out.

narrative flattening

post-training compression

thematic variation