🤖 AI Summary
This study systematically evaluates the moral reasoning capabilities of large language models (LLMs) in high-stakes decision-making contexts and their alignment with empirically observed human moral preferences. We propose PRIME, a novel evaluation framework integrating consequentialist/deontological ethics, Moral Foundations Theory, and Kohlberg’s stages of moral development, coupled with a dual-protocol assessment—direct moral queries and analysis of responses to canonical ethical dilemmas. We quantitatively benchmark six state-of-the-art LLMs across five moral dimensions. Results reveal strong cross-model consensus on Care/Harm and Fairness/Cheating, yet systematic attenuation of Authority, Loyalty, and Sanctity. We introduce the first large-scale, multi-paradigm, reproducible cross-model ethical benchmark, demonstrating LLMs’ high decisiveness in ethical judgment, significant inter-model alignment, and overall fidelity to empirical human moral preferences—thereby establishing a methodological foundation for AI ethical alignment and uncovering structural biases in LLM moral cognition.
📝 Abstract
As large language models (LLMs) are increasingly deployed in consequential decision-making contexts, systematically assessing their ethical reasoning capabilities becomes a critical imperative. This paper introduces the Priorities in Reasoning and Intrinsic Moral Evaluation (PRIME) framework--a comprehensive methodology for analyzing moral priorities across foundational ethical dimensions including consequentialist-deontological reasoning, moral foundations theory, and Kohlberg's developmental stages. We apply this framework to six leading LLMs through a dual-protocol approach combining direct questioning and response analysis to established ethical dilemmas. Our analysis reveals striking patterns of convergence: all evaluated models demonstrate strong prioritization of care/harm and fairness/cheating foundations while consistently underweighting authority, loyalty, and sanctity dimensions. Through detailed examination of confidence metrics, response reluctance patterns, and reasoning consistency, we establish that contemporary LLMs (1) produce decisive ethical judgments, (2) demonstrate notable cross-model alignment in moral decision-making, and (3) generally correspond with empirically established human moral preferences. This research contributes a scalable, extensible methodology for ethical benchmarking while highlighting both the promising capabilities and systematic limitations in current AI moral reasoning architectures--insights critical for responsible development as these systems assume increasingly significant societal roles.