🤖 AI Summary
This study addresses the challenge that online hate speech is often disguised as verifiable claims, necessitating joint assessment of both harmfulness and verifiability—a task poorly handled by conventional approaches. To bridge this gap, the authors propose the first unified modeling framework for hate speech detection and claim verifiability, introducing WSF-ARG+, the first dual-annotated dataset for this purpose. They further design an LLM-in-the-loop human-AI collaborative annotation pipeline that integrates twelve open-source large language models, substantially reducing manual annotation costs without compromising quality. Experimental results demonstrate that hate speech containing verifiable claims tends to be more aggressive. Moreover, incorporating verifiability labels significantly enhances hate speech detection performance, yielding a maximum macro-F1 improvement of 0.213 and an average gain of 0.154 across evaluated large language models.
📝 Abstract
Hateful content online is often expressed using fact-like, not necessarily correct information, especially in coordinated online harassment campaigns and extremist propaganda. Failing to jointly address hate speech (HS) and misinformation can deepen prejudice, reinforce harmful stereotypes, and expose bystanders to psychological distress, while polluting public debate. Moreover, these messages require more effort from content moderators because they must assess both harmfulness and veracity, i.e., fact-check them. To address this challenge, we release WSF-ARG+, the first dataset which combines hate speech with check-worthiness information. We also introduce a novel LLM-in-the-loop framework to facilitate the annotation of check-worthy claims. We run our framework, testing it with 12 open-weight LLMs of different sizes and architectures. We validate it through extensive human evaluation, and show that our LLM-in-the-loop framework reduces human effort without compromising the annotation quality of the data. Finally, we show that HS messages with check-worthy claims show significantly higher harassment and hate, and that incorporating check-worthiness labels improves LLM-based HS detection up to 0.213 macro-F1 and to 0.154 macro-F1 on average for large models.