Lessons Without Borders? Evaluating Cultural Alignment of LLMs Using Multilingual Story Moral Generation

📅 2026-04-09

📈 Citations: 0

✨ Influential: 0

career value

151K/year

🤖 AI Summary

This work addresses the limited sensitivity of large language models to cultural variations in human moral judgment during cross-cultural narrative understanding. It proposes “multilingual story-based moral generation” as a novel evaluation paradigm for dynamic cultural alignment, leveraging human-authored moral judgments across 14 language–culture pairs. Employing semantic similarity metrics, human preference surveys, and a values classification framework, the study systematically evaluates state-of-the-art models including GPT-4o and Gemini. Results reveal that while model-generated moral statements exhibit semantic proximity to human responses and are often preferred by annotators, they display significantly lower cross-linguistic variability than human judgments, tending instead to converge on universalist values and underrepresent cultural diversity. This research moves beyond static benchmarks by introducing narrative moral generation as a first-of-its-kind approach to assessing cultural alignment in AI systems.

Technology Category

Application Category

📝 Abstract

Stories are key to transmitting values across cultures, but their interpretation varies across linguistic and cultural contexts. Thus, we introduce multilingual story moral generation as a novel culturally grounded evaluation task. Using a new dataset of human-written story morals collected across 14 language-culture pairs, we compare model outputs with human interpretations via semantic similarity, a human preference survey, and value categorization. We show that frontier models such as GPT-4o and Gemini generate story morals that are semantically similar to human responses and preferred by human evaluators. However, their outputs exhibit markedly less cross-linguistic variation and concentrate on a narrower set of widely shared values. These findings suggest that while contemporary models can approximate central tendencies of human moral interpretation, they struggle to reproduce the diversity that characterizes human narrative understanding. By framing narrative interpretation as an evaluative task, this work introduces a new approach to studying cultural alignment in language models beyond static benchmarks or knowledge-based tests.

Problem

Research questions and friction points this paper is trying to address.

cultural alignment

multilingual story moral generation

cross-linguistic variation

value diversity

narrative interpretation

Innovation

Methods, ideas, or system contributions that make the work stand out.

multilingual story moral generation

cultural alignment

large language models