Cultural Authenticity: Comparing LLM Cultural Representations to Native Human Expectations

📅 2026-04-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses a critical gap in the cultural evaluation of large language models (LLMs), which has predominantly emphasized diversity and factual accuracy while overlooking local populations’ perceptions of cultural value priorities. To bridge this gap, the authors propose a human-centered evaluation framework that constructs “cultural importance vectors” from open-ended survey responses across nine countries as human benchmarks. They design a syntactically diverse prompt set to elicit corresponding “cultural representation vectors” from three state-of-the-art LLMs and quantify alignment between model outputs and local cultural expectations through vector similarity. Introducing, for the first time, a cultural importance–representation alignment mechanism, this approach transcends limitations of conventional metrics. Empirical results reveal a pervasive Western-centric bias: alignment decreases with greater cultural distance from the United States, and all models consistently (ρ > 0.97) overemphasize superficial cultural symbols while underrepresenting deeper societal values.
📝 Abstract
Cultural representation in Large Language Model (LLM) outputs has primarily been evaluated through the proxies of cultural diversity and factual accuracy. However, a crucial gap remains in assessing cultural alignment: the degree to which generated content mirrors how native populations perceive and prioritize their own cultural facets. In this paper, we introduce a human-centered framework to evaluate the alignment of LLM generations with local expectations. First, we establish a human-derived ground-truth baseline of importance vectors, called Cultural Importance Vectors based on an induced set of culturally significant facets from open-ended survey responses collected across nine countries. Next, we introduce a method to compute model-derived Cultural Representation Vectors of an LLM based on a syntactically diversified prompt-set and apply it to three frontier LLMs (Gemini 2.5 Pro, GPT-4o, and Claude 3.5 Haiku). Our investigation of the alignment between the human-derived Cultural Importance and model-derived Cultural Representations reveals a Western-centric calibration for some of the models where alignment decreases as a country's cultural distance from the US increases. Furthermore, we identify highly correlated, systemic error signatures ($ρ> 0.97$) across all models, which over-index on some cultural markers while neglecting the deep-seated social and value-based priorities of users. Our approach moves beyond simple diversity metrics toward evaluating the fidelity of AI-generated content in authentically capturing the nuanced hierarchies of global cultures.
Problem

Research questions and friction points this paper is trying to address.

Cultural Authenticity
Large Language Models
Cultural Alignment
Cultural Representation
Human Expectations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cultural Alignment
Cultural Importance Vectors
Large Language Models
Cross-cultural Evaluation
Human-centered AI
🔎 Similar Papers
No similar papers found.
E
Erin MacMurray van Liemt
Google Research
A
Aida Davani
Google Research
S
Sinchana Kumbale
Google
N
Neha Dixit
Google
Sunipa Dev
Sunipa Dev
Senior Research Scientist, Google
Natural Language ProcessingResponsible AIMachine LearningAlgorithmic Fairness