π€ AI Summary
This study addresses the limited cultural adaptability of mainstream large language models, which predominantly align with Western-centric values and struggle to generalize across diverse cultural subgroups. Taking Singapore as a case study, the authors construct a dataset of over 20,000 samples based on the World Values Survey to systematically evaluate and quantify modelsβ out-of-distribution cultural value generalization. Combining role-based prompting, structured numerical preference fine-tuning, and distance-aware fairness metrics, they find that GPT-4.1 achieves only 57.4% accuracy in predicting modal preferences of unseen subgroups. While fine-tuning improves average out-of-distribution accuracy by 17.4%, it simultaneously exacerbates preference biases toward younger, male, Chinese-ethnic, and Christian subgroups, revealing significant fairness challenges in fine-grained alignment.
π Abstract
Despite their global prevalence, many Large Language Models (LLMs) are aligned to a monolithic, often Western-centric set of values. This paper investigates the more challenging task of fine-grained value alignment: examining whether LLMs can emulate the distinct cultural values of demographic subgroups. Using Singapore as a case study and the World Values Survey (WVS), we examine the value landscape and show that even state-of-the-art models like GPT-4.1 achieve only 57.4% accuracy in predicting subgroup modal preferences. We construct a dataset of over 20,000 samples to train and evaluate a range of models. We demonstrate that simple fine-tuning on structured numerical preferences yields substantial gains, improving accuracy on unseen, out-of-distribution subgroups by an average of 17.4%. These gains partially transfer to open-ended generation. However, we find significant pre-existing performance biases, where models better emulate young, male, Chinese, and Christian personas. Furthermore, while fine-tuning improves average performance, it widens the disparity between subgroups when measured by distance-aware metrics. Our work offers insights into the limits and fairness implications of subgroup-level cultural alignment.