Beyond Ethical Alignment: Evaluating LLMs as Artificial Moral Assistants

📅 2025-08-18

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Existing LLM moral evaluation overly emphasizes value alignment while neglecting explicit moral reasoning capabilities. Method: We propose the first theoretical framework and benchmark for Artificial Moral Agents (AMAs), grounded in philosophical ethics. Our approach formalizes moral behavior models and systematically quantifies distinct reasoning types—particularly deductive and abductive reasoning—through a novel evaluation suite applied to leading open-source LLMs. Results/Contribution: While models perform reasonably on basic moral judgments, they exhibit consistent weaknesses in abductive reasoning tasks requiring value trade-offs and causal attribution; performance varies significantly across models. Crucially, we move beyond the “alignment-as-morality” paradigm by establishing moral deliberation—especially abductive reasoning—as a measurable, empirically grounded dimension. This provides both a theoretical foundation and empirical evidence for deep, fine-grained assessment and targeted improvement of AI moral competence.

Technology Category

Application Category

📝 Abstract

The recent rise in popularity of large language models (LLMs) has prompted considerable concerns about their moral capabilities. Although considerable effort has been dedicated to aligning LLMs with human moral values, existing benchmarks and evaluations remain largely superficial, typically measuring alignment based on final ethical verdicts rather than explicit moral reasoning. In response, this paper aims to advance the investigation of LLMs' moral capabilities by examining their capacity to function as Artificial Moral Assistants (AMAs), systems envisioned in the philosophical literature to support human moral deliberation. We assert that qualifying as an AMA requires more than what state-of-the-art alignment techniques aim to achieve: not only must AMAs be able to discern ethically problematic situations, they should also be able to actively reason about them, navigating between conflicting values outside of those embedded in the alignment phase. Building on existing philosophical literature, we begin by designing a new formal framework of the specific kind of behaviour an AMA should exhibit, individuating key qualities such as deductive and abductive moral reasoning. Drawing on this theoretical framework, we develop a benchmark to test these qualities and evaluate popular open LLMs against it. Our results reveal considerable variability across models and highlight persistent shortcomings, particularly regarding abductive moral reasoning. Our work connects theoretical philosophy with practical AI evaluation while also emphasising the need for dedicated strategies to explicitly enhance moral reasoning capabilities in LLMs. Code available at https://github.com/alessioGalatolo/AMAeval

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs' moral reasoning beyond superficial ethical alignment

Assessing LLMs as Artificial Moral Assistants for human deliberation

Developing benchmarks to test moral reasoning capabilities in LLMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces Artificial Moral Assistants framework

Develops benchmark for moral reasoning evaluation

Highlights gaps in abductive moral reasoning

🔎 Similar Papers

No similar papers found.