π€ AI Summary
This study addresses a critical gap in testing the Uniform Information Density hypothesis by incorporating discourse participant salienceβa factor overlooked in prior work. Leveraging a corpus of 70,000 human-annotated English discourse mentions, the authors systematically investigate how global entity salience modulates contextual predictability through surprisal metrics, minimal-pair prompting, and multivariate control analyses. They find that salient entities themselves exhibit higher surprisal but consistently reduce surprisal in surrounding discourse, thereby enhancing overall information distribution predictability. This regulatory effect is strongest in topically coherent texts and weakest in dialogic genres, revealing entity salience as a fundamental mechanism shaping discourse information structure.
π Abstract
Previous work examining the Uniform Information Density (UID) hypothesis has shown that while information as measured by surprisal metrics is distributed more or less evenly across documents overall, local discrepancies can arise due to functional pressures corresponding to syntactic and discourse structural constraints. However, work thus far has largely disregarded the relative salience of discourse participants. We fill this gap by studying how overall salience of entities in discourse relates to surprisal using 70K manually annotated mentions across 16 genres of English and a novel minimal-pair prompting method. Our results show that globally salient entities exhibit significantly higher surprisal than non-salient ones, even controlling for position, length, and nesting confounds. Moreover, salient entities systematically reduce surprisal for surrounding content when used as prompts, enhancing document-level predictability. This effect varies by genre, appearing strongest in topic-coherent texts and weakest in conversational contexts. Our findings refine the UID competing pressures framework by identifying global entity salience as a mechanism shaping information distribution in discourse.