Dealing with Annotator Disagreement in Hate Speech Classification

๐Ÿ“… 2025-02-12
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses severe and long-overlooked inter-annotator subjectivity disagreement in Turkish tweet hate speech classification. We propose a BERT fine-tuning framework integrating multi-annotator consistency modeling, soft-label learning, and uncertainty-aware training. To our knowledge, this is the first systematic evaluation of disagreement-aware modeling for non-English hate speech detection, and we introduce the first standardized multi-annotator benchmark for this task. Our method explicitly models annotation uncertainty, treating annotator disagreement as structured supervisory signal rather than noiseโ€”thereby enhancing model robustness and generalization. Evaluated on a real-world Turkish tweet dataset with multiple annotations per instance, our approach achieves state-of-the-art performance, significantly outperforming single-annotator baselines in accuracy. The framework offers a transferable methodological paradigm for low-resource, high-subjectivity text classification tasks.

Technology Category

Application Category

๐Ÿ“ Abstract
Hate speech detection is a crucial task, especially on social media, where harmful content can spread quickly. Implementing machine learning models to automatically identify and address hate speech is essential for mitigating its impact and preventing its proliferation. The first step in developing an effective hate speech detection model is to acquire a high-quality dataset for training. Labeled data is foundational for most natural language processing tasks, but categorizing hate speech is difficult due to the diverse and often subjective nature of hate speech, which can lead to varying interpretations and disagreements among annotators. This paper examines strategies for addressing annotator disagreement, an issue that has been largely overlooked. In particular, we evaluate different approaches to deal with annotator disagreement regarding hate speech classification in Turkish tweets, based on a fine-tuned BERT model. Our work highlights the importance of the problem and provides state-of-art benchmark results for detection and understanding of hate speech in online discourse.
Problem

Research questions and friction points this paper is trying to address.

Addressing annotator disagreement in hate speech classification.
Improving hate speech detection in Turkish tweets using BERT.
Developing strategies for high-quality hate speech datasets.
Innovation

Methods, ideas, or system contributions that make the work stand out.

BERT model fine-tuning
Annotator disagreement strategies
Turkish tweets classification
๐Ÿ”Ž Similar Papers
No similar papers found.
S
Somaiyeh Dehghan
Faculty of Engineering and Natural Sciences, Sabanci University, Istanbul, Turkey 34956; Center of Excellence in Data Analytics (VERIM), Sabanci University, Istanbul, Turkey 34956
M
Mehmet Umut Sen
Faculty of Engineering and Natural Sciences, Sabanci University, Istanbul, Turkey 34956; Center of Excellence in Data Analytics (VERIM), Sabanci University, Istanbul, Turkey 34956
Berrin Yanikoglu
Berrin Yanikoglu
Professor of Computer Science, Sabanci University
Image understandingBiometricsHandwriting recognitionNLP