MoralityGym: A Benchmark for Evaluating Hierarchical Moral Alignment in Sequential Decision-Making Agents

📅 2026-02-13

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

This work addresses the lack of computable and evaluable moral alignment in current AI systems when confronted with hierarchical and potentially conflicting human ethical norms. The authors propose Morality Chains, a formal framework that models hierarchical moral rules as ordered deontic constraints, and introduce MoralityGym—a Gymnasium-based benchmark comprising 98 ethically challenging scenarios—to enable decoupled evaluation of task performance and moral judgment. Additionally, they develop a Morality Metric grounded in insights from psychology and philosophy to quantify an agent’s moral reasoning capabilities in sequential decision-making. Experimental results reveal significant limitations of existing safe reinforcement learning approaches in handling complex moral reasoning, thereby laying a foundation for the development of reliable, transparent, and ethically aligned AI systems.

Technology Category

Application Category

📝 Abstract

Evaluating moral alignment in agents navigating conflicting, hierarchically structured human norms is a critical challenge at the intersection of AI safety, moral philosophy, and cognitive science. We introduce Morality Chains, a novel formalism for representing moral norms as ordered deontic constraints, and MoralityGym, a benchmark of 98 ethical-dilemma problems presented as trolley-dilemma-style Gymnasium environments. By decoupling task-solving from moral evaluation and introducing a novel Morality Metric, MoralityGym allows the integration of insights from psychology and philosophy into the evaluation of norm-sensitive reasoning. Baseline results with Safe RL methods reveal key limitations, underscoring the need for more principled approaches to ethical decision-making. This work provides a foundation for developing AI systems that behave more reliably, transparently, and ethically in complex real-world contexts.

Problem

Research questions and friction points this paper is trying to address.

moral alignment

sequential decision-making

hierarchical moral norms

ethical dilemmas

AI safety

Innovation

Methods, ideas, or system contributions that make the work stand out.

Morality Chains

MoralityGym

moral alignment