Dutch CrowS-Pairs: Adapting a Challenge Dataset for Measuring Social Biases in Language Models for Dutch

📅 2025-07-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the absence of social bias evaluation for Dutch language models by introducing Dutch CrowS-Pairs—the first Dutch-specific bias benchmark—comprising 1,463 sentence pairs covering nine bias categories, including gender, sexual orientation, and disability. Adhering to the CrowS-Pairs methodology, the dataset undergoes rigorous translation and cultural adaptation. We conduct systematic quantitative bias assessments across five models: BERTje, RobBERT, mBERT, GEITje, and Mistral-7B. Results show that dedicated Dutch models exhibit lower overall bias than English counterparts, yet remain significantly biased; multilingual models (e.g., mBERT) display distinct bias patterns compared to monolingual ones; and persona-based prompting effectively mitigates biased outputs. This study fills a critical gap in bias evaluation for low-resource languages, demonstrates the profound influence of linguistic and cultural context on bias manifestation, and provides a reproducible benchmark and methodological framework for cross-lingual fairness research.

Technology Category

Application Category

📝 Abstract
Warning: This paper contains explicit statements of offensive stereotypes which might be upsetting. Language models are prone to exhibiting biases, further amplifying unfair and harmful stereotypes. Given the fast-growing popularity and wide application of these models, it is necessary to ensure safe and fair language models. As of recent considerable attention has been paid to measuring bias in language models, yet the majority of studies have focused only on English language. A Dutch version of the US-specific CrowS-Pairs dataset for measuring bias in Dutch language models is introduced. The resulting dataset consists of 1463 sentence pairs that cover bias in 9 categories, such as Sexual orientation, Gender and Disability. The sentence pairs are composed of contrasting sentences, where one of the sentences concerns disadvantaged groups and the other advantaged groups. Using the Dutch CrowS-Pairs dataset, we show that various language models, BERTje, RobBERT, multilingual BERT, GEITje and Mistral-7B exhibit substantial bias across the various bias categories. Using the English and French versions of the CrowS-Pairs dataset, bias was evaluated in English (BERT and RoBERTa) and French (FlauBERT and CamemBERT) language models, and it was shown that English models exhibit the most bias, whereas Dutch models the least amount of bias. Additionally, results also indicate that assigning a persona to a language model changes the level of bias it exhibits. These findings highlight the variability of bias across languages and contexts, suggesting that cultural and linguistic factors play a significant role in shaping model biases.
Problem

Research questions and friction points this paper is trying to address.

Measures social biases in Dutch language models
Adapts CrowS-Pairs dataset for Dutch bias evaluation
Compares bias across Dutch, English, and French models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adapted CrowS-Pairs dataset for Dutch language models
Evaluated bias in Dutch models across nine categories
Compared bias levels across multiple languages and models
🔎 Similar Papers
No similar papers found.
E
Elza Strazda
Department of Advanced Computing Sciences, Maastricht University
Gerasimos Spanakis
Gerasimos Spanakis
Maastricht University
Assistant Professor