🤖 AI Summary
In natural language inference (NLI), prevalent annotation disagreements are not random noise but stem from semantic ambiguities in premises or hypotheses—reflecting inherent diversity in human interpretation. Existing NLI datasets lack systematic annotations of ambiguity types (e.g., anaphora, quantification, temporal relations), hindering alignment between model predictions and human judgments. To address this, we propose *ambiguity-aware NLI*: a paradigm featuring (1) a unified taxonomy of semantic ambiguities, (2) an example-driven annotation protocol, and (3) an unsupervised, fine-grained ambiguity detection method. Our key contributions include: (i) the first causal analysis linking specific ambiguity types to annotation disagreement patterns; (ii) the release of the first structured ambiguity-annotated NLI resource; and (iii) empirical validation that ambiguity-aware modeling significantly improves model robustness and interpretability. This work establishes a principled foundation for developing human-aligned NLI systems.
📝 Abstract
This position paper argues that annotation disagreement in Natural Language Inference (NLI) is not mere noise but often reflects meaningful interpretive variation, especially when triggered by ambiguity in the premise or hypothesis. While underspecified guidelines and annotator behavior can contribute to variation, content-based ambiguity offers a process-independent signal of divergent human perspectives. We call for a shift toward ambiguity-aware NLI by systematically identifying ambiguous input pairs and classifying ambiguity types. To support this, we present a unified framework that integrates existing taxonomies and illustrate key ambiguity subtypes through concrete examples. These examples reveal how ambiguity shapes annotator decisions and motivate the need for targeted detection methods that better align models with human interpretation. A key limitation is the lack of datasets annotated for ambiguity and subtypes. We propose addressing this gap through new annotated resources and unsupervised approaches to ambiguity detection -- paving the way for more robust, explainable, and human-aligned NLI systems.