"Pull or Not to Pull?'': Investigating Moral Biases in Leading Large Language Models Across Ethical Dilemmas

📅 2025-08-10

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

This study investigates moral biases in large language models (LLMs) when confronting ethical dilemmas and examines their differential responses across philosophical frameworks—utilitarianism, deontology, altruism, etc. Using 27 trolley-problem variants, we systematically evaluate 14 mainstream LLMs across ten ethical frameworks, yielding 3,780 binary decisions with natural-language justifications. A factorial prompting protocol and multidimensional metrics—decisiveness, explanation consistency, and alignment with public morality—are employed for quantitative analysis. We identify “sweet spots” in decision-making for altruistic, fairness-based, and virtue-ethical reasoning. Crucially, we posit that moral prompting serves not only as a behavioral intervention tool but also as a novel diagnostic paradigm for uncovering latent value orientations. Results show that reasoning-augmented models produce more structured and decisive explanations, yet exhibit significant deviations from human consensus in kinship-, law-, or self-interest-sensitive scenarios—highlighting persistent heterogeneity in current LLM moral alignment.

Technology Category

Application Category

📝 Abstract

As large language models (LLMs) increasingly mediate ethically sensitive decisions, understanding their moral reasoning processes becomes imperative. This study presents a comprehensive empirical evaluation of 14 leading LLMs, both reasoning enabled and general purpose, across 27 diverse trolley problem scenarios, framed by ten moral philosophies, including utilitarianism, deontology, and altruism. Using a factorial prompting protocol, we elicited 3,780 binary decisions and natural language justifications, enabling analysis along axes of decisional assertiveness, explanation answer consistency, public moral alignment, and sensitivity to ethically irrelevant cues. Our findings reveal significant variability across ethical frames and model types: reasoning enhanced models demonstrate greater decisiveness and structured justifications, yet do not always align better with human consensus. Notably, "sweet zones" emerge in altruistic, fairness, and virtue ethics framings, where models achieve a balance of high intervention rates, low explanation conflict, and minimal divergence from aggregated human judgments. However, models diverge under frames emphasizing kinship, legality, or self interest, often producing ethically controversial outcomes. These patterns suggest that moral prompting is not only a behavioral modifier but also a diagnostic tool for uncovering latent alignment philosophies across providers. We advocate for moral reasoning to become a primary axis in LLM alignment, calling for standardized benchmarks that evaluate not just what LLMs decide, but how and why.

Problem

Research questions and friction points this paper is trying to address.

Investigates moral biases in LLMs' ethical decision-making processes

Evaluates 14 LLMs across 27 trolley problems using 10 moral philosophies

Examines alignment with human consensus and sensitivity to irrelevant cues

Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluates 14 LLMs using trolley problem scenarios

Employs factorial prompting for binary decisions

Analyzes moral alignment via decisiveness and consistency

🔎 Similar Papers

Language Model Alignment in Multilingual Trolley Problems