Do Large Language Models Understand Morality Across Cultures?

📅 2025-07-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates large language models’ (LLMs) capacity to comprehend cross-cultural moral concepts. Addressing the lack of systematic evaluation in prior work, we propose a three-pronged validation framework: (1) variance-based analysis of moral score disparities across cultures; (2) clustering alignment to assess consistency between model outputs and empirically grounded cultural structures from the World Values Survey (WVS) and European Social Survey (ESS); and (3) targeted probe prompting using predefined moral word pairs. Experiments span major open- and closed-weight LLMs under diverse cultural scenarios, quantifying moral judgment biases. Results reveal pervasive cross-cultural moral compression—i.e., reduced intercultural variation—in current LLMs, coupled with significantly low alignment to real-world survey data, exposing latent cultural biases. To our knowledge, this is the first systematic demonstration of cultural misalignment in LLM moral modeling, providing a reproducible evaluation framework and empirical foundation for advancing culturally equitable AI.

Technology Category

Application Category

📝 Abstract
Recent advancements in large language models (LLMs) have established them as powerful tools across numerous domains. However, persistent concerns about embedded biases, such as gender, racial, and cultural biases arising from their training data, raise significant questions about the ethical use and societal consequences of these technologies. This study investigates the extent to which LLMs capture cross-cultural differences and similarities in moral perspectives. Specifically, we examine whether LLM outputs align with patterns observed in international survey data on moral attitudes. To this end, we employ three complementary methods: (1) comparing variances in moral scores produced by models versus those reported in surveys, (2) conducting cluster alignment analyses to assess correspondence between country groupings derived from LLM outputs and survey data, and (3) directly probing models with comparative prompts using systematically chosen token pairs. Our results reveal that current LLMs often fail to reproduce the full spectrum of cross-cultural moral variation, tending to compress differences and exhibit low alignment with empirical survey patterns. These findings highlight a pressing need for more robust approaches to mitigate biases and improve cultural representativeness in LLMs. We conclude by discussing the implications for the responsible development and global deployment of LLMs, emphasizing fairness and ethical alignment.
Problem

Research questions and friction points this paper is trying to address.

Assessing LLMs' understanding of cross-cultural moral perspectives
Evaluating alignment between LLM outputs and international moral survey data
Identifying biases and improving cultural representativeness in LLMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Compare moral score variances between models and surveys
Conduct cluster alignment analyses for country groupings
Probe models with comparative prompts using token pairs
🔎 Similar Papers
No similar papers found.
Hadi Mohammadi
Hadi Mohammadi
PhD candidate at Utrecht University
Natural Language ProcessingExplainable AIReinforcement learningComputational Social Science
Y
Yasmeen F. S. S. Meijer
Department of Methodology and Statistics, Utrecht University, The Netherlands
E
Efthymia Papadopoulou
Department of Methodology and Statistics, Utrecht University, The Netherlands
Ayoub Bagheri
Ayoub Bagheri
Associate Professor, Utrecht University
Natural Language ProcessingComputational Linguistics