MoralityGym: A Benchmark for Evaluating Hierarchical Moral Alignment in Sequential Decision-Making Agents

📅 2026-02-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of computable and evaluable moral alignment in current AI systems when confronted with hierarchical and potentially conflicting human ethical norms. The authors propose Morality Chains, a formal framework that models hierarchical moral rules as ordered deontic constraints, and introduce MoralityGym—a Gymnasium-based benchmark comprising 98 ethically challenging scenarios—to enable decoupled evaluation of task performance and moral judgment. Additionally, they develop a Morality Metric grounded in insights from psychology and philosophy to quantify an agent’s moral reasoning capabilities in sequential decision-making. Experimental results reveal significant limitations of existing safe reinforcement learning approaches in handling complex moral reasoning, thereby laying a foundation for the development of reliable, transparent, and ethically aligned AI systems.

Technology Category

Application Category

📝 Abstract
Evaluating moral alignment in agents navigating conflicting, hierarchically structured human norms is a critical challenge at the intersection of AI safety, moral philosophy, and cognitive science. We introduce Morality Chains, a novel formalism for representing moral norms as ordered deontic constraints, and MoralityGym, a benchmark of 98 ethical-dilemma problems presented as trolley-dilemma-style Gymnasium environments. By decoupling task-solving from moral evaluation and introducing a novel Morality Metric, MoralityGym allows the integration of insights from psychology and philosophy into the evaluation of norm-sensitive reasoning. Baseline results with Safe RL methods reveal key limitations, underscoring the need for more principled approaches to ethical decision-making. This work provides a foundation for developing AI systems that behave more reliably, transparently, and ethically in complex real-world contexts.
Problem

Research questions and friction points this paper is trying to address.

moral alignment
sequential decision-making
hierarchical moral norms
ethical dilemmas
AI safety
Innovation

Methods, ideas, or system contributions that make the work stand out.

Morality Chains
MoralityGym
moral alignment
deontic constraints
ethical decision-making
S
Simon Rosen
University of the Witwatersrand, Johannesburg, South Africa
Siddarth Singh
Siddarth Singh
Research engineer at Instadeep Ltd
Reinforcement Learning
E
Ebenezer Gelo
University of the Witwatersrand, Johannesburg, South Africa
H
Helen Sarah Robertson
University of the Witwatersrand, Johannesburg, South Africa
I
Ibrahim Suder
University of the Witwatersrand, Johannesburg, South Africa
V
Victoria Williams
University of the Witwatersrand, Johannesburg, South Africa
Benjamin Rosman
Benjamin Rosman
Professor at the University of the Witwatersrand, South Africa
RoboticsArtificial IntelligenceMachine LearningDecision MakingReinforcement Learning
Geraud Nangue Tasse
Geraud Nangue Tasse
University of the Witwatersrand
Reinforcement LearningDeep LearningMachine Learning
Steven James
Steven James
University of the Witwatersrand
Artificial IntelligenceReinforcement Learning