🤖 AI Summary
This work identifies human rights compliance risks in large language models (LLMs) arising from training data biases toward WEIRD (Western, Educated, Industrialized, Rich, Democratic) cultural norms. To address this, we propose the first integrated assessment framework that aligns the World Values Survey (WVS) with global human rights instruments—including the Universal Declaration of Human Rights and regional charters from Asia, Africa, and the Middle East—enabling quantitative, culture-ethics evaluation of LLM outputs. We systematically benchmark GPT-3.5, GPT-4, Llama-3, BLOOM, and Qwen across gender equality and fundamental rights dimensions. Results show that reducing WEIRD alignment improves cultural representativeness but increases human rights violations by 2–4%, particularly through reinforcement of harmful traditional gender norms. Our study establishes a novel paradigm for co-evaluating cross-cultural values and human rights benchmarks, providing empirical evidence and methodological foundations for equitable LLM governance.
📝 Abstract
Large language models (LLMs) are often trained on data that reflect WEIRD values: Western, Educated, Industrialized, Rich, and Democratic. This raises concerns about cultural bias and fairness. Using responses to the World Values Survey, we evaluated five widely used LLMs: GPT-3.5, GPT-4, Llama-3, BLOOM, and Qwen. We measured how closely these responses aligned with the values of the WEIRD countries and whether they conflicted with human rights principles. To reflect global diversity, we compared the results with the Universal Declaration of Human Rights and three regional charters from Asia, the Middle East, and Africa. Models with lower alignment to WEIRD values, such as BLOOM and Qwen, produced more culturally varied responses but were 2% to 4% more likely to generate outputs that violated human rights, especially regarding gender and equality. For example, some models agreed with the statements ``a man who cannot father children is not a real man'' and ``a husband should always know where his wife is'', reflecting harmful gender norms. These findings suggest that as cultural representation in LLMs increases, so does the risk of reproducing discriminatory beliefs. Approaches such as Constitutional AI, which could embed human rights principles into model behavior, may only partly help resolve this tension.