🤖 AI Summary
This study investigates the cognitive and linguistic foundations of entity salience in discourse—specifically, why certain entities are more readily attended to and retained in memory. Method: Drawing on a multi-genre English corpus, we propose a hierarchical salience metric grounded in *summary value*, integrating syntactic prominence, discourse relations (e.g., coherence structure), and pragmatic functions (e.g., information status and role importance). We extract features across levels—including syntactic position, anaphoric density, semantic role, and inferential load—and train an interpretable predictive model validated cross-genre. Contribution/Results: We demonstrate that entity salience is inherently multi-layered and cannot be adequately captured by single cues (e.g., definiteness or topical repetition). Instead, it emerges from the interaction of syntactic, referential, semantic, and pragmatic factors. Our framework advances computational discourse understanding and coreference resolution by offering both theoretical grounding—positioning salience as a graded, context-sensitive construct—and a robust, explainable modeling approach validated across diverse genres.
📝 Abstract
Entities in discourse vary broadly in salience: main participants, objects and locations are noticeable and memorable, while tangential ones are less important and quickly forgotten, raising questions about how humans signal and infer relative salience. Using a graded operationalization of salience based on summary-worthiness in multiple summaries of a discourse, this paper explores data from 24 spoken and written genres of English to extract a multifactorial complex of overt and implicit linguistic cues, such as recurring subjecthood or definiteness, discourse relations and hierarchy across utterances, as well as pragmatic functional inferences based on genre and communicative intent. Tackling the question 'how is the degree of salience expressed for each and every entity mentioned?' our results show that while previous approaches to salience all correlate with our salience scores to some extent, no single generalization is without exceptions, and the phenomenon cuts across all levels of linguistic representation.