Advancing Hate Speech Detection with Transformers: Insights from the MetaHate

📅 2025-08-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Online hate speech poses severe societal risks and is increasingly linked to real-world criminal activity, necessitating robust, cross-platform automated detection methods. This study conducts the first systematic evaluation of mainstream Transformer models—including BERT, RoBERTa, GPT-2, and ELECTRA—on MetaHate, a large-scale, multi-source dataset comprising 36 heterogeneous subsets. We specifically investigate fine-grained classification challenges arising from sarcasm, implicit semantics, and label noise. Through transfer learning with domain-adaptive fine-tuning, ELECTRA achieves the state-of-the-art F1 score of 0.8980. Our work not only validates the effectiveness of Transformer architectures for detecting heterogeneous hate speech but also establishes the first benchmark framework enabling cross-subset, multi-style evaluation. This framework provides critical empirical foundations for future model development and data curation practices in hate speech research.

Technology Category

Application Category

📝 Abstract
Hate speech is a widespread and harmful form of online discourse, encompassing slurs and defamatory posts that can have serious social, psychological, and sometimes physical impacts on targeted individuals and communities. As social media platforms such as X (formerly Twitter), Facebook, Instagram, Reddit, and others continue to facilitate widespread communication, they also become breeding grounds for hate speech, which has increasingly been linked to real-world hate crimes. Addressing this issue requires the development of robust automated methods to detect hate speech in diverse social media environments. Deep learning approaches, such as vanilla recurrent neural networks (RNNs), long short-term memory (LSTM), and convolutional neural networks (CNNs), have achieved good results, but are often limited by issues such as long-term dependencies and inefficient parallelization. This study represents the comprehensive exploration of transformer-based models for hate speech detection using the MetaHate dataset--a meta-collection of 36 datasets with 1.2 million social media samples. We evaluate multiple state-of-the-art transformer models, including BERT, RoBERTa, GPT-2, and ELECTRA, with fine-tuned ELECTRA achieving the highest performance (F1 score: 0.8980). We also analyze classification errors, revealing challenges with sarcasm, coded language, and label noise.
Problem

Research questions and friction points this paper is trying to address.

Detecting hate speech in diverse social media environments
Overcoming limitations of traditional deep learning methods
Addressing challenges with sarcasm and coded language
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based models for hate speech detection
Fine-tuned ELECTRA achieves highest performance
MetaHate dataset with 1.2 million samples
🔎 Similar Papers
No similar papers found.