Deconstructing sentence disambiguation by joint latent modeling of reading paradigms: LLM surprisal is not enough

📅 2026-02-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional surprisal metrics derived from large language models (LLMs) struggle to accurately capture the cognitive load humans experience when processing garden-path sentences. This work proposes a joint latent-variable mixture model that integrates data from four experimental paradigms—eye-tracking, self-paced reading (both moving-window and cumulative), and Maze—within a unified framework. The model explicitly disentangles garden-path probability, path cost, and reanalysis cost, while also accounting for attention-lapse trials. By moving beyond reliance on LLM surprisal alone, the approach successfully replicates key empirical patterns, including regressive eye movements, comprehension question accuracy, and grammaticality judgments. It significantly outperforms surprisal-only baselines in predicting both human reading behavior and task performance.

Technology Category

Application Category

📝 Abstract
Using temporarily ambiguous garden-path sentences ("While the team trained the striker wondered ...") as a test case, we present a latent-process mixture model of human reading behavior across four different reading paradigms (eye tracking, uni- and bidirectional self-paced reading, Maze). The model distinguishes between garden-path probability, garden-path cost, and reanalysis cost, and yields more realistic processing cost estimates by taking into account trials with inattentive reading. We show that the model is able to reproduce empirical patterns with regard to rereading behavior, comprehension question responses, and grammaticality judgments. Cross-validation reveals that the mixture model also has better predictive fit to human reading patterns and end-of-trial task data than a mixture-free model based on GPT-2-derived surprisal values. We discuss implications for future work.
Problem

Research questions and friction points this paper is trying to address.

sentence disambiguation
garden-path sentences
reading paradigms
processing cost
human reading behavior
Innovation

Methods, ideas, or system contributions that make the work stand out.

latent-process mixture model
garden-path sentences
reading paradigms
surprisal
inattentive reading
🔎 Similar Papers
No similar papers found.
D
Dario Paape
Department of Linguistics, University of Potsdam
Tal Linzen
Tal Linzen
New York University
Language modelsComputational linguisticsNatural language processingCognitive science
S
S. Vasishth
Department of Linguistics, University of Potsdam