🤖 AI Summary
Existing value-alignment approaches rely on demographic group labels, perpetuating stereotypes and obscuring individual variation in human values and preferences.
Method: We propose “individualized alignment”—a paradigm shift toward fine-grained reasoning about personal values—and introduce IndieValueCatalog, the first benchmark for individual value reasoning, derived from the World Values Survey. It requires models to predict an individual’s judgments in novel scenarios based solely on their self-reported value statements. We design the Value Inequity Index to quantify systematic model bias and establish a fine-grained consistency evaluation protocol. Models (IndieValueReasoner) are trained via supervised fine-tuning and prompt engineering.
Contribution/Results: Empirical evaluation reveals that state-of-the-art language models achieve only 55–65% accuracy on individual-level prediction; demographic features fail to substitute for personalized value representations; and significant global systematic biases persist across regions and value dimensions.
📝 Abstract
Recent calls for pluralistic alignment emphasize that AI systems should address the diverse needs of all people. Yet, efforts in this space often require sorting people into fixed buckets of pre-specified diversity-defining dimensions (e.g., demographics), risking smoothing out individualistic variations or even stereotyping. To achieve an authentic representation of diversity that respects individuality, we propose individualistic alignment. While individualistic alignment can take various forms, we introduce IndieValueCatalog, a dataset transformed from the influential World Values Survey (WVS), to study language models (LMs) on the specific challenge of individualistic value reasoning. Given a sample of an individual's value-expressing statements, models are tasked with predicting this person's value judgments in novel cases. With IndieValueCatalog, we reveal critical limitations in frontier LMs, which achieve only 55 % to 65% accuracy in predicting individualistic values. Moreover, our results highlight that a precise description of individualistic values cannot be approximated only with demographic information. We also identify a partiality of LMs in reasoning about global individualistic values, as measured by our proposed Value Inequity Index ({sigma}Inequity). Finally, we train a series of IndieValueReasoners to reveal new patterns and dynamics into global human values.