Are Language Models Consequentialist or Deontological Moral Reasoners?

📅 2025-05-27

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

This work addresses the critical misalignment between the “reasoning process” and “verbalized outputs” of large language models (LLMs) in ethical decision-making. We propose the first scalable moral reasoning taxonomy and systematically analyze over 600 variants of the trolley problem, examining both chain-of-thought (CoT) reasoning traces and post-hoc explanations. Through fine-grained annotation, contrastive attribution evaluation, and large-scale prompt engineering, we discover that LLMs implicitly favor deontological reasoning during internal deliberation, yet their post-hoc justifications exhibit a statistically significant bias toward consequentialism (p < 0.001), revealing a profound decoupling between latent reasoning and surface-level explanation. To support rigorous evaluation, we open-source MoralLens—a comprehensive assessment framework—providing a new empirical benchmark for interpretable, ethically aligned LLM behavior in high-stakes applications.

Technology Category

Application Category

📝 Abstract

As AI systems increasingly navigate applications in healthcare, law, and governance, understanding how they handle ethically complex scenarios becomes critical. Previous work has mainly examined the moral judgments in large language models (LLMs), rather than their underlying moral reasoning process. In contrast, we focus on a large-scale analysis of the moral reasoning traces provided by LLMs. Furthermore, unlike prior work that attempted to draw inferences from only a handful of moral dilemmas, our study leverages over 600 distinct trolley problems as probes for revealing the reasoning patterns that emerge within different LLMs. We introduce and test a taxonomy of moral rationales to systematically classify reasoning traces according to two main normative ethical theories: consequentialism and deontology. Our analysis reveals that LLM chains-of-thought tend to favor deontological principles based on moral obligations, while post-hoc explanations shift notably toward consequentialist rationales that emphasize utility. Our framework provides a foundation for understanding how LLMs process and articulate ethical considerations, an important step toward safe and interpretable deployment of LLMs in high-stakes decision-making environments. Our code is available at https://github.com/keenansamway/moral-lens .

Problem

Research questions and friction points this paper is trying to address.

Analyzes moral reasoning in LLMs using 600+ trolley problems

Classifies reasoning traces via consequentialism and deontology frameworks

Reveals LLMs favor deontological principles in chain-of-thought reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale analysis of moral reasoning traces

Taxonomy of moral rationales for systematic classification

Over 600 trolley problems to reveal reasoning patterns

🔎 Similar Papers

A Survey on Moral Foundation Theory and Pre-Trained Language Models: Current Advances and Challenges