Expect the Unexpected? Testing the Surprisal of Salient Entities

πŸ“… 2026-04-12
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

213K/year
πŸ€– AI Summary
This study addresses a critical gap in testing the Uniform Information Density hypothesis by incorporating discourse participant salienceβ€”a factor overlooked in prior work. Leveraging a corpus of 70,000 human-annotated English discourse mentions, the authors systematically investigate how global entity salience modulates contextual predictability through surprisal metrics, minimal-pair prompting, and multivariate control analyses. They find that salient entities themselves exhibit higher surprisal but consistently reduce surprisal in surrounding discourse, thereby enhancing overall information distribution predictability. This regulatory effect is strongest in topically coherent texts and weakest in dialogic genres, revealing entity salience as a fundamental mechanism shaping discourse information structure.

Technology Category

Application Category

πŸ“ Abstract
Previous work examining the Uniform Information Density (UID) hypothesis has shown that while information as measured by surprisal metrics is distributed more or less evenly across documents overall, local discrepancies can arise due to functional pressures corresponding to syntactic and discourse structural constraints. However, work thus far has largely disregarded the relative salience of discourse participants. We fill this gap by studying how overall salience of entities in discourse relates to surprisal using 70K manually annotated mentions across 16 genres of English and a novel minimal-pair prompting method. Our results show that globally salient entities exhibit significantly higher surprisal than non-salient ones, even controlling for position, length, and nesting confounds. Moreover, salient entities systematically reduce surprisal for surrounding content when used as prompts, enhancing document-level predictability. This effect varies by genre, appearing strongest in topic-coherent texts and weakest in conversational contexts. Our findings refine the UID competing pressures framework by identifying global entity salience as a mechanism shaping information distribution in discourse.
Problem

Research questions and friction points this paper is trying to address.

Uniform Information Density
surprisal
entity salience
discourse structure
information distribution
Innovation

Methods, ideas, or system contributions that make the work stand out.

entity salience
surprisal
Uniform Information Density
minimal-pair prompting
discourse predictability
πŸ”Ž Similar Papers
No similar papers found.