From Disagreement to Understanding: The Case for Ambiguity Detection in NLI

📅 2025-07-20

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

In natural language inference (NLI), prevalent annotation disagreements are not random noise but stem from semantic ambiguities in premises or hypotheses—reflecting inherent diversity in human interpretation. Existing NLI datasets lack systematic annotations of ambiguity types (e.g., anaphora, quantification, temporal relations), hindering alignment between model predictions and human judgments. To address this, we propose *ambiguity-aware NLI*: a paradigm featuring (1) a unified taxonomy of semantic ambiguities, (2) an example-driven annotation protocol, and (3) an unsupervised, fine-grained ambiguity detection method. Our key contributions include: (i) the first causal analysis linking specific ambiguity types to annotation disagreement patterns; (ii) the release of the first structured ambiguity-annotated NLI resource; and (iii) empirical validation that ambiguity-aware modeling significantly improves model robustness and interpretability. This work establishes a principled foundation for developing human-aligned NLI systems.

Technology Category

Application Category

📝 Abstract

This position paper argues that annotation disagreement in Natural Language Inference (NLI) is not mere noise but often reflects meaningful interpretive variation, especially when triggered by ambiguity in the premise or hypothesis. While underspecified guidelines and annotator behavior can contribute to variation, content-based ambiguity offers a process-independent signal of divergent human perspectives. We call for a shift toward ambiguity-aware NLI by systematically identifying ambiguous input pairs and classifying ambiguity types. To support this, we present a unified framework that integrates existing taxonomies and illustrate key ambiguity subtypes through concrete examples. These examples reveal how ambiguity shapes annotator decisions and motivate the need for targeted detection methods that better align models with human interpretation. A key limitation is the lack of datasets annotated for ambiguity and subtypes. We propose addressing this gap through new annotated resources and unsupervised approaches to ambiguity detection -- paving the way for more robust, explainable, and human-aligned NLI systems.

Problem

Research questions and friction points this paper is trying to address.

Detecting ambiguity in NLI to understand annotation disagreement

Classifying ambiguity types to align models with human interpretation

Addressing lack of ambiguity-annotated datasets for robust NLI systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Systematically identifying ambiguous input pairs

Integrating existing taxonomies for ambiguity classification

Proposing unsupervised ambiguity detection approaches

🔎 Similar Papers

AER-LLM: Ambiguity-aware Emotion Recognition Leveraging Large Language Models