Language Model Alignment in Multilingual Trolley Problems

📅 2024-07-02

🏛️ International Conference on Learning Representations

📈 Citations: 7

✨ Influential: 1

career value

201K/year

🤖 AI Summary

Assessing the cross-lingual alignment of large language models (LLMs) with human moral preferences remains underexplored, particularly across diverse linguistic and cultural contexts. Method: We systematically evaluate 19 LLMs on over 100 languages using a novel, large-scale multilingual trolley problem corpus—MultiTP—constructed from 40 million human judgments across 200+ countries in the Moral Machine dataset. Our approach integrates multilingual prompt engineering, cross-lingual semantic alignment, and six-dimensional moral preference modeling (species, gender, health, social status, age, number of lives), complemented by prompt robustness analysis. Contribution/Results: We uncover substantial cross-lingual misalignment: inter-language judgment divergence exceeds 37%; response consistency is highly prompt-sensitive; and model preferences exhibit measurable correlations with demographic characteristics of native speakers—challenging assumptions of universal AI ethical consistency. This work establishes the first benchmark for multilingual moral alignment and reveals critical limitations in current LLM ethics generalization.

Technology Category

Application Category

📝 Abstract

We evaluate the moral alignment of large language models (LLMs) with human preferences in multilingual trolley problems. Building on the Moral Machine experiment, which captures over 40 million human judgments across 200+ countries, we develop a cross-lingual corpus of moral dilemma vignettes in over 100 languages called MultiTP. This dataset enables the assessment of LLMs' decision-making processes in diverse linguistic contexts. Our analysis explores the alignment of 19 different LLMs with human judgments, capturing preferences across six moral dimensions: species, gender, fitness, status, age, and the number of lives involved. By correlating these preferences with the demographic distribution of language speakers and examining the consistency of LLM responses to various prompt paraphrasings, our findings provide insights into cross-lingual and ethical biases of LLMs and their intersection. We discover significant variance in alignment across languages, challenging the assumption of uniform moral reasoning in AI systems and highlighting the importance of incorporating diverse perspectives in AI ethics. The results underscore the need for further research on the integration of multilingual dimensions in responsible AI research to ensure fair and equitable AI interactions worldwide. Our code and data are at https://github.com/causalNLP/moralmachine

Problem

Research questions and friction points this paper is trying to address.

Evaluates LLM moral alignment with human preferences in multilingual dilemmas

Assesses LLM decision-making across 100+ languages using MultiTP dataset

Examines cross-lingual ethical biases and demographic influences on AI morality

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multilingual moral dilemma dataset (MultiTP)

Assessing LLM alignment across 6 moral dimensions

Analyzing cross-lingual ethical biases in AI

🔎 Similar Papers

No similar papers found.