Spoiler Alert: Narrative Forecasting as a Metric for Tension in LLM Storytelling

📅 2026-04-10

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

This work addresses the limitations of large language models (LLMs) in generating stories with high narrative tension and the inadequacy of existing evaluation metrics in capturing this dimension. The authors propose the 100-Endings metric, which quantifies narrative tension through the failure rate of sentence-by-sentence outcome prediction, and introduce geometric statistics such as turning-point ratio to capture plot twists. Grounded in narratological principles, they develop a generation framework that integrates story templates, multi-path ending prediction, and narrative scaffolding. Experimental results demonstrate that the proposed approach significantly enhances narrative tension in generated stories. The 100-Endings metric effectively discriminates between human-authored narratives from The New Yorker and LLM-generated texts, while the method maintains state-of-the-art performance on the EQ-Bench benchmark.

Technology Category

Application Category

📝 Abstract

LLMs have so far failed both to generate consistently compelling stories and to recognize this failure--on the leading creative-writing benchmark (EQ-Bench), LLM judges rank zero-shot AI stories above New Yorker short stories, a gold standard for literary fiction. We argue that existing rubrics overlook a key dimension of compelling human stories: narrative tension. We introduce the 100-Endings metric, which walks through a story sentence by sentence: at each position, a model predicts how the story will end 100 times given only the text so far, and we measure tension as how often predictions fail to match the ground truth. Beyond the mismatch rate, the sentence-level curve yields complementary statistics, such as inflection rate, a geometric measure of how frequently the curve reverses direction, tracking twists and revelations. Unlike rubric-based judges, 100-Endings correctly ranks New Yorker stories far above LLM outputs. Grounded in narratological principles, we design a story-generation pipeline using structural constraints, including analysis of story templates, idea formulation, and narrative scaffolding. Our pipeline significantly increases narrative tension as measured by the 100-Endings metric, while maintaining performance on the EQ-Bench leaderboard.

Problem

Research questions and friction points this paper is trying to address.

narrative tension

storytelling

large language models

story generation

narrative evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

narrative tension

100-Endings metric

story generation