The Convergent Ethics of AI? Analyzing Moral Foundation Priorities in Large Language Models with a Multi-Framework Approach

📅 2025-04-27

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

This study systematically evaluates the moral reasoning capabilities of large language models (LLMs) in high-stakes decision-making contexts and their alignment with empirically observed human moral preferences. We propose PRIME, a novel evaluation framework integrating consequentialist/deontological ethics, Moral Foundations Theory, and Kohlberg’s stages of moral development, coupled with a dual-protocol assessment—direct moral queries and analysis of responses to canonical ethical dilemmas. We quantitatively benchmark six state-of-the-art LLMs across five moral dimensions. Results reveal strong cross-model consensus on Care/Harm and Fairness/Cheating, yet systematic attenuation of Authority, Loyalty, and Sanctity. We introduce the first large-scale, multi-paradigm, reproducible cross-model ethical benchmark, demonstrating LLMs’ high decisiveness in ethical judgment, significant inter-model alignment, and overall fidelity to empirical human moral preferences—thereby establishing a methodological foundation for AI ethical alignment and uncovering structural biases in LLM moral cognition.

Technology Category

Application Category

📝 Abstract

As large language models (LLMs) are increasingly deployed in consequential decision-making contexts, systematically assessing their ethical reasoning capabilities becomes a critical imperative. This paper introduces the Priorities in Reasoning and Intrinsic Moral Evaluation (PRIME) framework--a comprehensive methodology for analyzing moral priorities across foundational ethical dimensions including consequentialist-deontological reasoning, moral foundations theory, and Kohlberg's developmental stages. We apply this framework to six leading LLMs through a dual-protocol approach combining direct questioning and response analysis to established ethical dilemmas. Our analysis reveals striking patterns of convergence: all evaluated models demonstrate strong prioritization of care/harm and fairness/cheating foundations while consistently underweighting authority, loyalty, and sanctity dimensions. Through detailed examination of confidence metrics, response reluctance patterns, and reasoning consistency, we establish that contemporary LLMs (1) produce decisive ethical judgments, (2) demonstrate notable cross-model alignment in moral decision-making, and (3) generally correspond with empirically established human moral preferences. This research contributes a scalable, extensible methodology for ethical benchmarking while highlighting both the promising capabilities and systematic limitations in current AI moral reasoning architectures--insights critical for responsible development as these systems assume increasingly significant societal roles.

Problem

Research questions and friction points this paper is trying to address.

Assessing ethical reasoning in large language models

Analyzing moral priorities across multiple ethical frameworks

Identifying convergence and limitations in AI moral reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

PRIME framework for multi-dimensional ethical analysis

Dual-protocol approach combining direct and response analysis

Benchmarking LLMs on human moral preference alignment

🔎 Similar Papers

A Survey on Moral Foundation Theory and Pre-Trained Language Models: Current Advances and Challenges