Mary, the Cheeseburger-Eating Vegetarian: Do LLMs Recognize Incoherence in Narratives?

📅 2025-12-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the capacity boundaries of large language models (LLMs) in detecting narrative incoherence. To this end, we construct a paired dataset of coherent/incoherent stories and employ probe-based analysis, systematic prompt engineering variants, and contrastive evaluation to assess consistency between internal representations and output behavior. We find that LLMs exhibit high sensitivity to violations of scene-setting constraints but remain largely insensitive to contradictions in character traits—suggesting reliance on prototypical world knowledge rather than deep narrative reasoning. While incoherence is detectable in internal model states, LLMs fail to reliably externalize such detection in question-answering tasks. This study provides the first empirical evidence of a “representation–behavior gap” in LLMs’ narrative understanding: although latent incoherence is encoded, models lack the inferential mechanism to map implicit cognitive states to consistent, task-appropriate judgments. Our findings establish a novel benchmark and theoretical framework for evaluating and advancing narrative AI.

Technology Category

Application Category

📝 Abstract
Leveraging a dataset of paired narratives, we investigate the extent to which large language models (LLMs) can reliably separate incoherent and coherent stories. A probing study finds that LLMs' internal representations can reliably identify incoherent narratives. However, LLMs generate responses to rating questions that fail to satisfactorily separate the coherent and incoherent narratives across several prompt variations, hinting at a gap in LLM's understanding of storytelling. The reasoning LLMs tested do not eliminate these deficits, indicating that thought strings may not be able to fully address the discrepancy between model internal state and behavior. Additionally, we find that LLMs appear to be more sensitive to incoherence resulting from an event that violates the setting (e.g., a rainy day in the desert) than to incoherence arising from a character violating an established trait (e.g., Mary, a vegetarian, later orders a cheeseburger), suggesting that LLMs may rely more on prototypical world knowledge than building meaning-based narrative coherence. The consistent asymmetry found in our results suggests that LLMs do not have a complete grasp on narrative coherence.
Problem

Research questions and friction points this paper is trying to address.

Investigates LLMs' ability to detect narrative incoherence.
Examines gap between internal model detection and external behavior.
Assesses LLMs' reliance on world knowledge over narrative meaning.
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs detect incoherence via internal representations
LLMs fail to separate narratives in generated responses
LLMs rely more on world knowledge than narrative coherence
🔎 Similar Papers
No similar papers found.
Karin de Langis
Karin de Langis
PhD Candidate, University of Minnesota
Artificial IntelligenceRoboticsComputer Vision
P
Püren Öncel
Department of Educational Psychology, University of Minnesota
R
Ryan Peters
Department of Computer Science and Engineering, University of Minnesota
A
Andrew Elfenbein
Department of English, University of Minnesota
L
Laura Kristen Allen
Department of Educational Psychology, University of Minnesota
A
Andreas Schramm
Department of Linguistics, Hamline University
Dongyeop Kang
Dongyeop Kang
University of Minnesota
Natural Language Processing