Towards Fairness Assessment of Dutch Hate Speech Detection

📅 2025-06-14

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

This study addresses the lack of fairness evaluation in Dutch hate speech detection by conducting the first systematic counterfactual fairness analysis. We propose a grammar- and context-aware counterfactual generation strategy (MGS/SLL), construct a lexicon of Dutch social group terms, and leverage large language models to generate high-quality counterfactual instances. We further introduce Counterfactual Token Fairness (CTF), a novel fine-grained fairness metric. Using fine-tuned Transformer models, we jointly optimize for both predictive performance and fairness, achieving significant improvements in accuracy, average counterfactual fairness, and group-level fairness (e.g., equality of odds). Our work fills a critical gap in fairness research for hate speech detection in non-English, low-resource languages and provides a reusable methodological framework for fair NLP in under-resourced linguistic settings.

Technology Category

Application Category

📝 Abstract

Numerous studies have proposed computational methods to detect hate speech online, yet most focus on the English language and emphasize model development. In this study, we evaluate the counterfactual fairness of hate speech detection models in the Dutch language, specifically examining the performance and fairness of transformer-based models. We make the following key contributions. First, we curate a list of Dutch Social Group Terms that reflect social context. Second, we generate counterfactual data for Dutch hate speech using LLMs and established strategies like Manual Group Substitution (MGS) and Sentence Log-Likelihood (SLL). Through qualitative evaluation, we highlight the challenges of generating realistic counterfactuals, particularly with Dutch grammar and contextual coherence. Third, we fine-tune baseline transformer-based models with counterfactual data and evaluate their performance in detecting hate speech. Fourth, we assess the fairness of these models using Counterfactual Token Fairness (CTF) and group fairness metrics, including equality of odds and demographic parity. Our analysis shows that models perform better in terms of hate speech detection, average counterfactual fairness and group fairness. This work addresses a significant gap in the literature on counterfactual fairness for hate speech detection in Dutch and provides practical insights and recommendations for improving both model performance and fairness.

Problem

Research questions and friction points this paper is trying to address.

Assessing fairness in Dutch hate speech detection models

Generating realistic Dutch counterfactuals for fairness evaluation

Improving transformer models' performance and fairness in Dutch

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates Dutch counterfactual data using LLMs

Fine-tunes transformer models with counterfactuals

Assesses fairness via CTF and group metrics

🔎 Similar Papers

No similar papers found.