🤖 AI Summary
This work addresses the lack of systematic evaluation of implicit value preferences exhibited by large language models (LLMs) in everyday moral dilemmas. We introduce DailyDilemmas, the first benchmark dataset comprising 1,360 annotated cases spanning interpersonal, occupational, and environmental scenarios. Methodologically, we innovatively integrate five interdisciplinary value frameworks—including Moral Foundations Theory and the World Values Survey—and propose a joint analytical approach combining multi-theory mapping with principle alignment (ModelSpec/Constitutional AI). Our findings reveal that LLMs consistently prioritize self-expression and care values, yet exhibit substantial inter-model disagreement—up to 19.1%—on critical dimensions such as truthfulness. Notably, official ethical principles published by OpenAI and Anthropic show significant misalignment with their models’ empirically observed value rankings. Furthermore, system-level prompting proves ineffective in modulating value prioritization. This study delivers a reproducible evaluation benchmark and theoretically grounded analytical toolkit for advancing LLM value alignment.
📝 Abstract
As users increasingly seek guidance from LLMs for decision-making in daily life, many of these decisions are not clear-cut and depend significantly on the personal values and ethical standards of people. We present DailyDilemmas, a dataset of 1,360 moral dilemmas encountered in everyday life. Each dilemma presents two possible actions, along with affected parties and relevant human values for each action. Based on these dilemmas, we gather a repository of human values covering diverse everyday topics, such as interpersonal relationships, workplace, and environmental issues. With DailyDilemmas, we evaluate LLMs on these dilemmas to determine what action they will choose and the values represented by these action choices. Then, we analyze values through the lens of five theoretical frameworks inspired by sociology, psychology, and philosophy, including the World Values Survey, Moral Foundations Theory, Maslow's Hierarchy of Needs, Aristotle's Virtues, and Plutchik's Wheel of Emotions. For instance, we find LLMs are most aligned with self-expression over survival in World Values Survey and care over loyalty in Moral Foundations Theory. Interestingly, we find substantial preference differences in models for some core values. For example, for truthfulness, Mixtral-8x7B neglects it by 9.7% while GPT-4-turbo selects it by 9.4%. We also study the recent guidance released by OpenAI (ModelSpec), and Anthropic (Constitutional AI) to understand how their designated principles reflect their models' actual value prioritization when facing nuanced moral reasoning in daily-life settings. Finally, we find that end users cannot effectively steer such prioritization using system prompts.