Human and LLM Biases in Hate Speech Annotations: A Socio-Demographic Analysis of Annotators and Targets

📅 2024-10-10

🏛️ arXiv.org

📈 Citations: 8

✨ Influential: 0

career value

212K/year

🤖 AI Summary

Online hate speech detection heavily relies on human-annotated data prone to social bias, yet the systematic biases arising from interactions between annotator and target demographic attributes remain empirically uncharacterized. This paper conducts the first quantitative analysis of bidirectional annotator–target demographic interactions—using a large-scale dataset rich in annotator and target sociodemographic metadata—to characterize the magnitude, distribution, and patterns of human bias. It further introduces a role-based prompting framework for large language models (LLMs) to comparatively evaluate their bias profiles. Results reveal pronounced intergroup heterogeneity in human bias, whereas LLM bias exhibits stronger contextual dependence and does not merely replicate human identity-based biases. These findings establish a new empirical benchmark and methodological foundation for developing fair, interpretable hate speech detection systems.

Technology Category

Application Category

📝 Abstract

The rise of online platforms exacerbated the spread of hate speech, demanding scalable and effective detection. However, the accuracy of hate speech detection systems heavily relies on human-labeled data, which is inherently susceptible to biases. While previous work has examined the issue, the interplay between the characteristics of the annotator and those of the target of the hate are still unexplored. We fill this gap by leveraging an extensive dataset with rich socio-demographic information of both annotators and targets, uncovering how human biases manifest in relation to the target's attributes. Our analysis surfaces the presence of widespread biases, which we quantitatively describe and characterize based on their intensity and prevalence, revealing marked differences. Furthermore, we compare human biases with those exhibited by persona-based LLMs. Our findings indicate that while persona-based LLMs do exhibit biases, these differ significantly from those of human annotators. Overall, our work offers new and nuanced results on human biases in hate speech annotations, as well as fresh insights into the design of AI-driven hate speech detection systems.

Problem

Research questions and friction points this paper is trying to address.

Examining biases in hate speech annotations by humans and LLMs

Analyzing socio-demographic influences on annotator and target biases

Comparing human and LLM biases for hate speech detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging socio-demographic data for bias analysis

Comparing human and LLM biases quantitatively

Designing AI-driven hate speech detection systems

🔎 Similar Papers

Beyond Hate Speech: NLP's Challenges and Opportunities in Uncovering Dehumanizing Language