Knowing But Not Doing: Convergent Morality and Divergent Action in LLMs

📅 2026-01-12

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This study investigates whether large language models (LLMs) exhibit value-behavior consistency—i.e., alignment between their self-reported values and actual decision-making—in real-world scenarios. Grounded in Schwartz’s theory of human values, we construct ValAct-15k, a dataset comprising 3,000 Reddit-based situational prompts, and conduct a comprehensive analysis integrating situational questionnaires, psychometric assessments, cross-cultural comparisons across five LLMs each from the U.S. and China, and human benchmarking. Our findings reveal a significant “value-action gap” in LLMs: while models demonstrate near-perfect consistency in behavioral responses across contexts (r ≈ 1.0), their stated values correlate weakly with actual behavior (r = 0.3). Moreover, role-playing specific values incurs at most a 6.6% performance drop, challenging prevailing assumptions about the efficacy of current alignment training paradigms.

Technology Category

Application Category

📝 Abstract

Value alignment is central to the development of safe and socially compatible artificial intelligence. However, how Large Language Models (LLMs) represent and enact human values in real-world decision contexts remains under-explored. We present ValAct-15k, a dataset of 3,000 advice-seeking scenarios derived from Reddit, designed to elicit ten values defined by Schwartz Theory of Basic Human Values. Using both the scenario-based questions and the traditional value questionnaire, we evaluate ten frontier LLMs (five from U.S. companies, five from Chinese ones) and human participants ($n = 55$). We find near-perfect cross-model consistency in scenario-based decisions (Pearson $r \approx 1.0$), contrasting sharply with the broad variability observed among humans ($r \in [-0.79, 0.98]$). Yet, both humans and LLMs show weak correspondence between self-reported and enacted values ($r = 0.4, 0.3$), revealing a systematic knowledge-action gap. When instructed to"hold"a specific value, LLMs'performance declines up to $6.6%$ compared to merely selecting the value, indicating a role-play aversion. These findings suggest that while alignment training yields normative value convergence, it does not eliminate the human-like incoherence between knowing and acting upon values.

Problem

Research questions and friction points this paper is trying to address.

value alignment

knowledge-action gap

large language models

moral behavior

human values

Innovation

Methods, ideas, or system contributions that make the work stand out.

value alignment

knowledge-action gap

large language models