Tracing How Annotators Think: Augmenting Preference Judgments with Reading Processes

📅 2025-11-26

📈 Citations: 0

✨ Influential: 0

career value

158K/year

🤖 AI Summary

Understanding how human annotators’ reading behaviors—such as rereading, skipping, and navigation paths—reflect cognitive decision-making during preference annotation remains underexplored. Method: Leveraging fine-grained mouse-tracking data capturing annotator interactions between prompts and candidate responses, this work introduces reading behavior as a novel behavioral signal for preference annotation analysis, and releases the first publicly available dataset, PreferRead. Contribution/Results: Statistical modeling reveals that rereading frequency is significantly positively correlated with annotation consistency, whereas longer reading paths predict lower consistency. This study establishes a new cognitive dimension for interpreting human judgment in subjective NLP tasks, substantially enhancing the explainability of annotator reliability, sources of disagreement, and decision biases. By grounding preference learning in empirically observed behavioral patterns, it advances the foundation for more robust and trustworthy preference modeling.

Technology Category

Application Category

📝 Abstract

We propose an annotation approach that captures not only labels but also the reading process underlying annotators' decisions, e.g., what parts of the text they focus on, re-read or skim. Using this framework, we conduct a case study on the preference annotation task, creating a dataset PreferRead that contains fine-grained annotator reading behaviors obtained from mouse tracking. PreferRead enables detailed analysis of how annotators navigate between a prompt and two candidate responses before selecting their preference. We find that annotators re-read a response in roughly half of all trials, most often revisiting the option they ultimately choose, and rarely revisit the prompt. Reading behaviors are also significantly related to annotation outcomes: re-reading is associated with higher inter-annotator agreement, whereas long reading paths and times are associated with lower agreement. These results demonstrate that reading processes provide a complementary cognitive dimension for understanding annotator reliability, decision-making and disagreement in complex, subjective NLP tasks. Our code and data are publicly available.

Problem

Research questions and friction points this paper is trying to address.

Captures annotators' reading processes for preference judgments

Analyzes reading behaviors' impact on annotation agreement

Provides cognitive insights into annotator reliability in NLP tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Captures reading processes via mouse tracking

Creates dataset with fine-grained annotator behaviors

Links reading patterns to annotation reliability outcomes

🔎 Similar Papers

Preference Consistency Matters: Enhancing Preference Learning in Language Models with Automated Self-Curation of Training Corpora