Tracing How Annotators Think: Augmenting Preference Judgments with Reading Processes

📅 2025-11-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Understanding how human annotators’ reading behaviors—such as rereading, skipping, and navigation paths—reflect cognitive decision-making during preference annotation remains underexplored. Method: Leveraging fine-grained mouse-tracking data capturing annotator interactions between prompts and candidate responses, this work introduces reading behavior as a novel behavioral signal for preference annotation analysis, and releases the first publicly available dataset, PreferRead. Contribution/Results: Statistical modeling reveals that rereading frequency is significantly positively correlated with annotation consistency, whereas longer reading paths predict lower consistency. This study establishes a new cognitive dimension for interpreting human judgment in subjective NLP tasks, substantially enhancing the explainability of annotator reliability, sources of disagreement, and decision biases. By grounding preference learning in empirically observed behavioral patterns, it advances the foundation for more robust and trustworthy preference modeling.

Technology Category

Application Category

📝 Abstract
We propose an annotation approach that captures not only labels but also the reading process underlying annotators' decisions, e.g., what parts of the text they focus on, re-read or skim. Using this framework, we conduct a case study on the preference annotation task, creating a dataset PreferRead that contains fine-grained annotator reading behaviors obtained from mouse tracking. PreferRead enables detailed analysis of how annotators navigate between a prompt and two candidate responses before selecting their preference. We find that annotators re-read a response in roughly half of all trials, most often revisiting the option they ultimately choose, and rarely revisit the prompt. Reading behaviors are also significantly related to annotation outcomes: re-reading is associated with higher inter-annotator agreement, whereas long reading paths and times are associated with lower agreement. These results demonstrate that reading processes provide a complementary cognitive dimension for understanding annotator reliability, decision-making and disagreement in complex, subjective NLP tasks. Our code and data are publicly available.
Problem

Research questions and friction points this paper is trying to address.

Captures annotators' reading processes for preference judgments
Analyzes reading behaviors' impact on annotation agreement
Provides cognitive insights into annotator reliability in NLP tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Captures reading processes via mouse tracking
Creates dataset with fine-grained annotator behaviors
Links reading patterns to annotation reliability outcomes
🔎 Similar Papers
No similar papers found.
Karin de Langis
Karin de Langis
PhD Candidate, University of Minnesota
Artificial IntelligenceRoboticsComputer Vision
W
William Walker
Department of Computer Science and Engineering, University of Minnesota
K
Khanh Chi Le
Department of Computer Science and Engineering, University of Minnesota
Dongyeop Kang
Dongyeop Kang
University of Minnesota
Natural Language Processing