🤖 AI Summary
Existing bias evaluation frameworks focus predominantly on English classification tasks and fail to detect narrative biases in multilingual generative settings—particularly cultural stereotypes (e.g., associating Arabs with terrorism or labeling African groups as “backward”) that surface most acutely in low-resource languages.
Method: We introduce DebateBias-8K, the first debate-style multilingual benchmark covering four sensitive domains—women’s rights, economic development, terrorism, and religion—and propose a structured cross-lingual debate paradigm to assess implicit biases in large language models (LLMs).
Contribution/Results: Analyzing >100K model responses from GPT-4o, Claude 3, DeepSeek, and LLaMA 3 across 12 languages, we find all models exhibit significant, linguistically uneven bias—most severe in low-resource languages. Critically, English-centric safety alignment fails to generalize multilingually. This work exposes a deep equity gap in global multilingual AI deployment and establishes a novel, empirically grounded evaluation framework for culturally inclusive alignment.
📝 Abstract
Large language models (LLMs) are widely deployed for open-ended communication, yet most bias evaluations still rely on English, classification-style tasks. We introduce DebateBias-8K, a new multilingual, debate-style benchmark designed to reveal how narrative bias appears in realistic generative settings. Our dataset includes 8,400 structured debate prompts spanning four sensitive domains: women's rights, socioeconomic development, terrorism, and religion, across seven languages ranging from high-resource (English, Chinese) to low-resource (Swahili, Nigerian Pidgin). Using four flagship models (GPT-4o, Claude 3, DeepSeek, and LLaMA 3), we generate and automatically classify over 100,000 responses. Results show that all models reproduce entrenched stereotypes despite safety alignment: Arabs are overwhelmingly linked to terrorism and religion (>=95%), Africans to socioeconomic "backwardness" (up to <=77%), and Western groups are consistently framed as modern or progressive. Biases grow sharply in lower-resource languages, revealing that alignment trained primarily in English does not generalize globally. Our findings highlight a persistent divide in multilingual fairness: current alignment methods reduce explicit toxicity but fail to prevent biased outputs in open-ended contexts. We release our DebateBias-8K benchmark and analysis framework to support the next generation of multilingual bias evaluation and safer, culturally inclusive model alignment.