Delving into Multilingual Ethical Bias: The MSQAD with Statistical Hypothesis Tests for Large Language Models

📅 2025-05-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates whether ethical biases exhibited by large language models (LLMs) on 17 globally salient human rights topics are language-dependent in multilingual settings. To this end, we introduce MSQAD—the first open-source, multilingual sensitive question-answering benchmark—designed specifically for evaluating ethical bias across languages. We further propose a dual statistical hypothesis testing framework, combining chi-square and Fisher’s exact tests, to rigorously quantify cross-lingual ethical disparities. Experiments encompass major LLMs and diverse languages; results significantly reject the null hypothesis of “no bias” across most language–topic pairs, confirming the pervasive and consistent nature of ethical bias across models and languages. MSQAD establishes a reproducible, scalable, and standardized evaluation tool for multilingual ethical assessment, addressing a critical gap in the availability of open benchmarks for cross-lingual bias analysis.

Technology Category

Application Category

📝 Abstract
Despite the recent strides in large language models, studies have underscored the existence of social biases within these systems. In this paper, we delve into the validation and comparison of the ethical biases of LLMs concerning globally discussed and potentially sensitive topics, hypothesizing that these biases may arise from language-specific distinctions. Introducing the Multilingual Sensitive Questions&Answers Dataset (MSQAD), we collected news articles from Human Rights Watch covering 17 topics, and generated socially sensitive questions along with corresponding responses in multiple languages. We scrutinized the biases of these responses across languages and topics, employing two statistical hypothesis tests. The results showed that the null hypotheses were rejected in most cases, indicating biases arising from cross-language differences. It demonstrates that ethical biases in responses are widespread across various languages, and notably, these biases were prevalent even among different LLMs. By making the proposed MSQAD openly available, we aim to facilitate future research endeavors focused on examining cross-language biases in LLMs and their variant models.
Problem

Research questions and friction points this paper is trying to address.

Investigating ethical biases in multilingual large language models
Analyzing biases across languages using MSQAD dataset
Validating cross-language bias prevalence with statistical tests
Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed Multilingual Sensitive Questions & Answers Dataset (MSQAD)
Used statistical hypothesis tests for bias analysis
Analyzed cross-language ethical biases in LLMs
🔎 Similar Papers
No similar papers found.