MoReBench: Evaluating Procedural and Pluralistic Moral Reasoning in Language Models, More than Outcomes

📅 2025-10-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing moral evaluation frameworks overly emphasize decision outcomes while neglecting the underlying reasoning process. Method: We propose a process-oriented paradigm for evaluating moral reasoning, introducing MoReBench—a benchmark comprising 1,000 expert-annotated moral dilemmas—and design a multidimensional scoring scheme to assess models’ capabilities in identifying moral elements, balancing trade-offs, generating recommendations, and applying five major ethical frameworks. We further employ reasoning trajectory analysis and controlled-variable experiments for structured evaluation. Contribution/Results: Our evaluation reveals systematic reasoning biases and framework-specific preferences across mainstream LMs; moral reasoning competence does not follow conventional scaling laws and exhibits instability both in autonomous and human-assisted settings. This work shifts AI value alignment assessment from “whether the answer is correct” to “how the reasoning unfolds,” establishing a novel, transparent, and interpretable benchmark and methodology for evaluating moral reasoning capabilities.

Technology Category

Application Category

📝 Abstract
As AI systems progress, we rely more on them to make decisions with us and for us. To ensure that such decisions are aligned with human values, it is imperative for us to understand not only what decisions they make but also how they come to those decisions. Reasoning language models, which provide both final responses and (partially transparent) intermediate thinking traces, present a timely opportunity to study AI procedural reasoning. Unlike math and code problems which often have objectively correct answers, moral dilemmas are an excellent testbed for process-focused evaluation because they allow for multiple defensible conclusions. To do so, we present MoReBench: 1,000 moral scenarios, each paired with a set of rubric criteria that experts consider essential to include (or avoid) when reasoning about the scenarios. MoReBench contains over 23 thousand criteria including identifying moral considerations, weighing trade-offs, and giving actionable recommendations to cover cases on AI advising humans moral decisions as well as making moral decisions autonomously. Separately, we curate MoReBench-Theory: 150 examples to test whether AI can reason under five major frameworks in normative ethics. Our results show that scaling laws and existing benchmarks on math, code, and scientific reasoning tasks fail to predict models' abilities to perform moral reasoning. Models also show partiality towards specific moral frameworks (e.g., Benthamite Act Utilitarianism and Kantian Deontology), which might be side effects of popular training paradigms. Together, these benchmarks advance process-focused reasoning evaluation towards safer and more transparent AI.
Problem

Research questions and friction points this paper is trying to address.

Evaluating procedural moral reasoning in language models
Assessing pluralistic moral frameworks in AI systems
Developing benchmarks for transparent moral decision-making processes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmark with moral scenarios and rubric criteria
Tests reasoning under five normative ethics frameworks
Evaluates procedural reasoning beyond outcome-based assessments
🔎 Similar Papers
No similar papers found.
Y
Yu Ying Chiu
University of Washington
M
Michael S. Lee
Scale AI
R
Rachel Calcott
Harvard University
B
Brandon Handoko
Scale AI
P
Paul de Font-Reaulx
University of Michigan
P
Paula Rodriguez
Scale AI
C
Chen Bo Calvin Zhang
Scale AI
Z
Ziwen Han
Scale AI
Udari Madhushani Sehwag
Udari Madhushani Sehwag
Research Scientist, Scale AI
Agentic AIAlignmentScalable oversightAI SafetyMulti-agent RL
Yash Maurya
Yash Maurya
Scale AI
PrivacyAI SafetyDifferential PrivacyFairnessExplainable AI
C
Christina Q Knight
Scale AI
H
Harry R. Lloyd
UNC Chapel Hill
F
Florence Bacus
Harvard University
Mantas Mazeika
Mantas Mazeika
Center for AI Safety
ML SafetyAI SafetyMachine EthicsML Reliability
B
Bing Liu
Scale AI
Yejin Choi
Yejin Choi
Stanford University / NVIDIA
Natural Language ProcessingDeep LearningArtificial IntelligenceCommonsense Reasoning
M
Mitchell L Gordon
MIT
Sydney Levine
Sydney Levine
Visiting Research Scientist, Google Deepmind
moral psychologycognitive scienceAI safetyAI ethics