Improving Hate Speech Classification with Cross-Taxonomy Dataset Integration

📅 2025-03-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Hate speech detection suffers from poor generalizability due to heterogeneous definitions and dataset annotations across domains. Method: This paper introduces the first universal cross-taxonomy framework for semantic alignment and joint modeling of multi-source annotation standards (e.g., HateSpeech18, FDCL19). We propose a BERT-based multitask learning architecture integrating terminology mapping, label-space alignment, and adversarial domain adaptation, enabling a single model to jointly classify hate speech subtypes under legal, platform-specific, and research-oriented definitions. Contribution/Results: Our approach achieves a 6.2-point F1-score improvement on independent test sets and an average 11.4% gain in cross-definition transfer accuracy. It significantly reduces reliance on multiple specialized models, offering a scalable, interpretable, and unified modeling paradigm for hate speech identification.

Technology Category

Application Category

📝 Abstract
Algorithmic hate speech detection faces significant challenges due to the diverse definitions and datasets used in research and practice. Social media platforms, legal frameworks, and institutions each apply distinct yet overlapping definitions, complicating classification efforts. This study addresses these challenges by demonstrating that existing datasets and taxonomies can be integrated into a unified model, enhancing prediction performance and reducing reliance on multiple specialized classifiers. The work introduces a universal taxonomy and a hate speech classifier capable of detecting a wide range of definitions within a single framework. Our approach is validated by combining two widely used but differently annotated datasets, showing improved classification performance on an independent test set. This work highlights the potential of dataset and taxonomy integration in advancing hate speech detection, increasing efficiency, and ensuring broader applicability across contexts.
Problem

Research questions and friction points this paper is trying to address.

Integrates diverse hate speech datasets into a unified model.
Proposes a universal taxonomy for broader hate speech detection.
Enhances classification performance across different contexts.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrated cross-taxonomy datasets for unified model
Developed universal taxonomy for hate speech classification
Enhanced classifier performance with dataset integration
🔎 Similar Papers
No similar papers found.
J
Jan Fillies
Institut für Angewandte Informatik, Leipzig, Germany; Freie Universität Berlin, Berlin, Germany
Adrian Paschke
Adrian Paschke
Professor, Computer Science, Freie Universitaet Berlin
Corporate Semantic WebMachine LearningArtificial IntelligenceData AnalyticsSemantic Technologies