A Survey of Machine Learning Models and Datasets for the Multi-label Classification of Textual Hate Speech in English

📅 2025-04-11

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

The multi-type, overlapping nature of online hate speech renders conventional binary classification inadequate, motivating the shift toward multi-label classification. Method: We conduct the first systematic review of 46 English-language studies—spanning 28 datasets and 24 models—employing meta-analysis, cross-dataset consistency evaluation, and quantitative assessment of annotation quality (e.g., inter-annotator agreement, IAA). Contribution/Results: We reveal substantial heterogeneity in label taxonomies, dataset sizes, annotation rigor, and evaluation metrics. Key shared challenges include class imbalance, crowdsourcing bias, and sparse minority-label instances. Based on empirical findings, we propose 10 actionable methodological recommendations. We empirically validate the effectiveness of mainstream multi-label architectures—including BERT- and RNN-based models. This work establishes the first academic benchmark and practical guideline for developing robust, comparable, and regulation-compliant multi-label hate speech detection systems.

Technology Category

Application Category

📝 Abstract

The dissemination of online hate speech can have serious negative consequences for individuals, online communities, and entire societies. This and the large volume of hateful online content prompted both practitioners', i.e., in content moderation or law enforcement, and researchers' interest in machine learning models to automatically classify instances of hate speech. Whereas most scientific works address hate speech classification as a binary task, practice often requires a differentiation into sub-types, e.g., according to target, severity, or legality, which may overlap for individual content. Hence, researchers created datasets and machine learning models that approach hate speech classification in textual data as a multi-label problem. This work presents the first systematic and comprehensive survey of scientific literature on this emerging research landscape in English (N=46). We contribute with a concise overview of 28 datasets suited for training multi-label classification models that reveals significant heterogeneity regarding label-set, size, meta-concept, annotation process, and inter-annotator agreement. Our analysis of 24 publications proposing suitable classification models further establishes inconsistency in evaluation and a preference for architectures based on Bidirectional Encoder Representation from Transformers (BERT) and Recurrent Neural Networks (RNNs). We identify imbalanced training data, reliance on crowdsourcing platforms, small and sparse datasets, and missing methodological alignment as critical open issues and formulate ten recommendations for research.

Problem

Research questions and friction points this paper is trying to address.

Surveying multi-label hate speech classification models and datasets

Analyzing dataset heterogeneity and model evaluation inconsistencies

Identifying open issues in hate speech classification research

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-label classification for hate speech

BERT and RNN based model architectures

Survey of 28 diverse training datasets

🔎 Similar Papers

No similar papers found.