Towards Generalizable Generic Harmful Speech Datasets for Implicit Hate Speech Detection

📅 2025-06-19

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

This work addresses the poor generalization capability of implicit hate speech detection models. We propose a labeling–enhancement co-design framework: first, we systematically validate the prevalence of unlabeled implicit instances in mainstream toxic speech datasets; second, we identify critical samples via dictionary-driven methods, perform expert re-annotation, and apply semantic enhancement using Llama-3 70B and GPT-4o to construct a more robust implicit hate speech dataset. This approach mitigates subjective biases inherent in crowdsourced annotation and enables, for the first time, systematic discovery and high-fidelity reconstruction of implicit hate speech in publicly available data. Experimental results demonstrate that our method improves implicit hate detection F1-score by 12.9 points under cross-dataset evaluation, significantly enhancing model generalization to unseen data.

Technology Category

Application Category

📝 Abstract

Implicit hate speech has recently emerged as a critical challenge for social media platforms. While much of the research has traditionally focused on harmful speech in general, the need for generalizable techniques to detect veiled and subtle forms of hate has become increasingly pressing. Based on lexicon analysis, we hypothesize that implicit hate speech is already present in publicly available harmful speech datasets but may not have been explicitly recognized or labeled by annotators. Additionally, crowdsourced datasets are prone to mislabeling due to the complexity of the task and often influenced by annotators'subjective interpretations. In this paper, we propose an approach to address the detection of implicit hate speech and enhance generalizability across diverse datasets by leveraging existing harmful speech datasets. Our method comprises three key components: influential sample identification, reannotation, and augmentation using Llama-3 70B and GPT-4o. Experimental results demonstrate the effectiveness of our approach in improving implicit hate detection, achieving a +12.9-point F1 score improvement compared to the baseline.

Problem

Research questions and friction points this paper is trying to address.

Detecting implicit hate speech in social media content

Improving generalizability across diverse harmful speech datasets

Addressing mislabeling issues in crowdsourced hate speech data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverage existing harmful speech datasets

Influential sample identification and reannotation

Augmentation using Llama-3 70B and GPT-4o

🔎 Similar Papers

No similar papers found.