GUMBridge: a Corpus for Varieties of Bridging Anaphora

📅 2025-12-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing English bridging anaphora resources suffer from limited scale, narrow genre coverage, and coarse-grained subtype annotation. This paper introduces GUMBridge—the first large-scale, multi-genre bridging coreference corpus covering 16 distinct English genres. It pioneers fine-grained bridging relation annotation (eight variant subtypes) within a multi-genre framework, substantially broadening linguistic coverage and diversity. Annotations are manually curated to ensure high quality and reliability. Using GUMBridge, we conduct baseline evaluations on three tasks—bridging resolution, bridging subtype classification, and bridging detection—revealing for the first time systematic limitations of state-of-the-art large language models in bridging comprehension: F1 scores consistently fall below 60%. These results confirm that bridging anaphora remains a highly challenging NLP task. GUMBridge thus establishes a scalable, high-fidelity benchmark resource and evaluation framework to advance research on bridging phenomena.

Technology Category

Application Category

📝 Abstract
Bridging is an anaphoric phenomenon where the referent of an entity in a discourse is dependent on a previous, non-identical entity for interpretation, such as in "There is 'a house'. 'The door' is red," where the door is specifically understood to be the door of the aforementioned house. While there are several existing resources in English for bridging anaphora, most are small, provide limited coverage of the phenomenon, and/or provide limited genre coverage. In this paper, we introduce GUMBridge, a new resource for bridging, which includes 16 diverse genres of English, providing both broad coverage for the phenomenon and granular annotations for the subtype categorization of bridging varieties. We also present an evaluation of annotation quality and report on baseline performance using open and closed source contemporary LLMs on three tasks underlying our data, showing that bridging resolution and subtype classification remain difficult NLP tasks in the age of LLMs.
Problem

Research questions and friction points this paper is trying to address.

Develops a diverse corpus for bridging anaphora annotation
Addresses limited genre and phenomenon coverage in existing resources
Evaluates bridging resolution difficulty with modern language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces GUMBridge corpus with 16 diverse English genres
Provides granular annotations for bridging subtype categorization
Evaluates baseline LLM performance on bridging resolution tasks
🔎 Similar Papers
No similar papers found.