Can Persona-Prompted LLMs Emulate Subgroup Values? An Empirical Analysis of Generalisability and Fairness in Cultural Alignment

πŸ“… 2026-04-14
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

218K/year
πŸ€– AI Summary
This study addresses the limited cultural adaptability of mainstream large language models, which predominantly align with Western-centric values and struggle to generalize across diverse cultural subgroups. Taking Singapore as a case study, the authors construct a dataset of over 20,000 samples based on the World Values Survey to systematically evaluate and quantify models’ out-of-distribution cultural value generalization. Combining role-based prompting, structured numerical preference fine-tuning, and distance-aware fairness metrics, they find that GPT-4.1 achieves only 57.4% accuracy in predicting modal preferences of unseen subgroups. While fine-tuning improves average out-of-distribution accuracy by 17.4%, it simultaneously exacerbates preference biases toward younger, male, Chinese-ethnic, and Christian subgroups, revealing significant fairness challenges in fine-grained alignment.

Technology Category

Application Category

πŸ“ Abstract
Despite their global prevalence, many Large Language Models (LLMs) are aligned to a monolithic, often Western-centric set of values. This paper investigates the more challenging task of fine-grained value alignment: examining whether LLMs can emulate the distinct cultural values of demographic subgroups. Using Singapore as a case study and the World Values Survey (WVS), we examine the value landscape and show that even state-of-the-art models like GPT-4.1 achieve only 57.4% accuracy in predicting subgroup modal preferences. We construct a dataset of over 20,000 samples to train and evaluate a range of models. We demonstrate that simple fine-tuning on structured numerical preferences yields substantial gains, improving accuracy on unseen, out-of-distribution subgroups by an average of 17.4%. These gains partially transfer to open-ended generation. However, we find significant pre-existing performance biases, where models better emulate young, male, Chinese, and Christian personas. Furthermore, while fine-tuning improves average performance, it widens the disparity between subgroups when measured by distance-aware metrics. Our work offers insights into the limits and fairness implications of subgroup-level cultural alignment.
Problem

Research questions and friction points this paper is trying to address.

value alignment
subgroup values
cultural alignment
fairness
Large Language Models
Innovation

Methods, ideas, or system contributions that make the work stand out.

subgroup value alignment
persona prompting
fine-tuning on numerical preferences
fairness in LLMs
out-of-distribution generalization