🤖 AI Summary
This work addresses the formal definition and quantitative evaluation of the “fair play” principle in detective fiction—specifically, how to balance narrative coherence with plot surprise while satisfying readers’ reasonable inferential expectations. We propose the first probabilistic framework for fairness assessment, jointly modeling coherence and surprise to evaluate the narrative quality of LLM-generated detective stories. Methodologically, we integrate LLM-based story generation, narrative structure parsing, and statistical significance testing to computationally measure clue distribution, foreshadowing traceability, and the logical plausibility of plot twists. Experiments reveal that state-of-the-art LLMs, despite high surprise scores, systematically violate fair play: clues are either excessively concealed or insufficiently grounded in causal logic, undermining suspense integrity and deductive validity. Our framework provides both a theoretical foundation and an empirical toolkit for AI-driven narrative evaluation, exposing a fundamental limitation of generative models in modeling causal, inference-oriented storytelling.
📝 Abstract
Effective storytelling relies on a delicate balance between meeting the reader's prior expectations and introducing unexpected developments. In the domain of detective fiction, this tension is known as fair play, which includes the implicit agreement between the writer and the reader as to the range of possible resolutions the mystery story may have. In this work, we present a probabilistic framework for detective fiction that allows us to define desired qualities. Using this framework, we formally define fair play and design appropriate metrics for it. Stemming from these definitions is an inherent tension between the coherence of the story, which measures how much it ``makes sense'', and the surprise it induces. We validate the framework by applying it to LLM-generated detective stories. This domain is appealing since we have an abundance of data, we can sample from the distribution generating the story, and the story-writing capabilities of LLMs are interesting in their own right. Results show that while LLM-generated stories may be unpredictable, they generally fail to balance the trade-off between surprise and fair play, which greatly contributes to their poor quality.