🤖 AI Summary
This work addresses the challenge of annotator disagreement in subjective and ambiguous natural language processing tasks—such as toxicity detection and stance analysis—where divergent perspectives are often dismissed as noise rather than meaningful signals. The authors propose a domain-agnostic taxonomy of annotation disagreement alongside a unified modeling framework that explicitly captures structural relationships among annotators and supports multi-target prediction. By introducing disagreement-aware evaluation metrics, the study advocates a paradigm shift from consensus-based learning toward perspectivist modeling, offering a normative lens for fairness assessment. The paper systematically integrates existing disagreement-aware methodologies, clarifies the trajectory of this evolving paradigm, and outlines promising future directions, including the incorporation of multi-source variability and the development of interpretable disagreement frameworks.
📝 Abstract
Annotator disagreement is widespread in NLP, particularly for subjective and ambiguous tasks such as toxicity detection and stance analysis. While early approaches treated disagreement as noise to be removed, recent work increasingly models it as a meaningful signal reflecting variation in interpretation and perspective. This survey provides a unified view of disagreement-aware NLP methods. We first present a domain-agnostic taxonomy of the sources of disagreement spanning data, task, and annotator factors. We then synthesize modeling approaches using a common framework defined by prediction targets and pooling structure, highlighting a shift from consensus learning toward explicitly modeling disagreement, and toward capturing structured relationships among annotators. We review evaluation metrics for both predictive performance and annotator behavior, and noting that most fairness evaluations remain descriptive rather than normative. We conclude by identifying open challenges and future directions, including integrating multiple sources of variation, developing disagreement-aware interpretability frameworks, and grappling with the practical tradeoffs of perspectivist modeling.