🤖 AI Summary
Online hate speech detection heavily relies on human-annotated data prone to social bias, yet the systematic biases arising from interactions between annotator and target demographic attributes remain empirically uncharacterized. This paper conducts the first quantitative analysis of bidirectional annotator–target demographic interactions—using a large-scale dataset rich in annotator and target sociodemographic metadata—to characterize the magnitude, distribution, and patterns of human bias. It further introduces a role-based prompting framework for large language models (LLMs) to comparatively evaluate their bias profiles. Results reveal pronounced intergroup heterogeneity in human bias, whereas LLM bias exhibits stronger contextual dependence and does not merely replicate human identity-based biases. These findings establish a new empirical benchmark and methodological foundation for developing fair, interpretable hate speech detection systems.
📝 Abstract
The rise of online platforms exacerbated the spread of hate speech, demanding scalable and effective detection. However, the accuracy of hate speech detection systems heavily relies on human-labeled data, which is inherently susceptible to biases. While previous work has examined the issue, the interplay between the characteristics of the annotator and those of the target of the hate are still unexplored. We fill this gap by leveraging an extensive dataset with rich socio-demographic information of both annotators and targets, uncovering how human biases manifest in relation to the target's attributes. Our analysis surfaces the presence of widespread biases, which we quantitatively describe and characterize based on their intensity and prevalence, revealing marked differences. Furthermore, we compare human biases with those exhibited by persona-based LLMs. Our findings indicate that while persona-based LLMs do exhibit biases, these differ significantly from those of human annotators. Overall, our work offers new and nuanced results on human biases in hate speech annotations, as well as fresh insights into the design of AI-driven hate speech detection systems.