EsBBQ and CaBBQ: The Spanish and Catalan Bias Benchmarks for Question Answering

📅 2025-07-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing social bias evaluations for large language models (LLMs) heavily rely on English and U.S.-centric contexts, lacking systematic, localized benchmarking resources for non-English and non-Western societies. Method: We introduce EsBBQ and CaBBQ—Spanish- and Catalan-language bias evaluation benchmarks tailored to the Spanish sociocultural context—covering 10 sensitive attributes and adopting a multiple-choice question-answering paradigm. Built upon the BBQ framework, they employ parallel data construction and empirically evaluate diverse LLMs across families, scales, and variants. Contribution/Results: This work presents the first localized bias benchmark explicitly designed for a non-English, non-U.S. society, achieving dual advances in cross-lingual and cross-cultural adaptation. Empirical analysis reveals that current LLMs exhibit pronounced reliance on societal stereotypes in ambiguous scenarios; moreover, higher QA accuracy often correlates positively with stronger bias propensity.

Technology Category

Application Category

📝 Abstract
Previous literature has largely shown that Large Language Models (LLMs) perpetuate social biases learnt from their pre-training data. Given the notable lack of resources for social bias evaluation in languages other than English, and for social contexts outside of the United States, this paper introduces the Spanish and the Catalan Bias Benchmarks for Question Answering (EsBBQ and CaBBQ). Based on the original BBQ, these two parallel datasets are designed to assess social bias across 10 categories using a multiple-choice QA setting, now adapted to the Spanish and Catalan languages and to the social context of Spain. We report evaluation results on different LLMs, factoring in model family, size and variant. Our results show that models tend to fail to choose the correct answer in ambiguous scenarios, and that high QA accuracy often correlates with greater reliance on social biases.
Problem

Research questions and friction points this paper is trying to address.

Assessing social bias in Spanish and Catalan LLMs
Adapting bias benchmarks to Spain's social context
Evaluating model accuracy versus bias reliance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Spanish and Catalan bias benchmarks creation
Adapted multiple-choice QA for bias assessment
Evaluated LLMs on social bias correlations
🔎 Similar Papers
No similar papers found.
V
Valle Ruiz-Fernández
Barcelona Supercomputing Center (BSC-CNS)
M
Mario Mina
Barcelona Supercomputing Center (BSC-CNS)
Júlia Falcão
Júlia Falcão
Barcelona Supercomputing Center (BSC)
NLPAI ethicsbiasLLM evaluation
L
Luis Vasquez-Reina
Barcelona Supercomputing Center (BSC-CNS)
A
Anna Sallés
Barcelona Supercomputing Center (BSC-CNS)
Aitor Gonzalez-Agirre
Aitor Gonzalez-Agirre
Barcelona Supercomputing Center (BSC)
Artificial IntelligenceNatural Language ProcessingSemanticsDeep Learning
Olatz Perez-de-Viñaspre
Olatz Perez-de-Viñaspre
HiTZ Center - IXA group, University of the Basque Country (UPV/EHU)
Natural Language Processing