Machine Behavior in Relational Moral Dilemmas: Moral Rightness, Predicted Human Behavior, and Model Decisions

📅 2026-04-23

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

This study investigates whether large language models (LLMs) can capture the context-dependence and social embeddedness of human moral judgment in interpersonal moral dilemmas. By systematically varying the severity of wrongdoing and relational closeness in a whistleblower dilemma, the authors evaluate LLMs across three dimensions: moral justifiability, prediction of human behavior, and autonomous decision-making. The findings reveal that while LLMs consistently adhere to fairness-oriented, rigid moral norms in their own decisions, their predictions of human behavior shift toward loyalty-based reasoning as relational closeness increases—resulting in a significant divergence between predicted human choices and model-generated judgments. This discrepancy provides the first evidence of an inconsistency between LLMs’ internal world models and their decision mechanisms, suggesting a fundamental lack of genuine social sensitivity.

Technology Category

Application Category

📝 Abstract

Human moral judgment is context-dependent and modulated by interpersonal relationships. As large language models (LLMs) increasingly function as decision-support systems, determining whether they encode these social nuances is critical. We characterize machine behavior using the Whistleblower's Dilemma by varying two experimental dimensions: crime severity and relational closeness. Our study evaluates three distinct perspectives: (1) moral rightness (prescriptive norms), (2) predicted human behavior (descriptive social expectations), and (3) autonomous model decision-making. By analyzing the reasoning processes, we identify a clear cross-perspective divergence: while moral rightness remains consistently fairness-oriented, predicted human behavior shifts significantly toward loyalty as relational closeness increases. Crucially, model decisions align with moral rightness judgments rather than their own behavioral predictions. This inconsistency suggests that LLM decision-making prioritizes rigid, prescriptive rules over the social sensitivity present in their internal world-modeling, which poses a gap that may lead to significant misalignments in real-world deployments.

Problem

Research questions and friction points this paper is trying to address.

machine behavior

moral dilemmas

relational closeness

large language models

moral judgment

Innovation

Methods, ideas, or system contributions that make the work stand out.

relational moral dilemmas

large language models

moral judgment