Addressing Moral Uncertainty using Large Language Models for Ethical Decision-Making

📅 2025-02-17

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

179K/year

🤖 AI Summary

This study addresses the challenge of reconciling multiple, often conflicting, ethical principles in reinforcement learning under moral uncertainty. We propose the first LLM-driven multi-paradigm ethical reasoning framework: leveraging Llama3 and GPT-4 to simulate five major ethical theories—including utilitarianism and deontology—to generate distributed moral beliefs. We introduce a novel integration of Belief Jensen–Shannon Divergence with Dempster–Shafer evidence theory to dynamically aggregate cross-theoretical beliefs and rigorously model epistemic uncertainty. The resulting interpretable, ethics-grounded shaping reward replaces hand-crafted reward functions. Integrated via ethical alignment fine-tuning atop PPO and SAC, our approach achieves a 32% improvement in decision consistency across multi-task ethical benchmarks, 91.4% accuracy in emergent moral conflict scenarios, and reduces reliance on manual reward engineering by 87%.

Technology Category

Application Category

📝 Abstract

We present an ethical decision-making framework that refines a pre-trained reinforcement learning (RL) model using a task-agnostic ethical layer. Following initial training, the RL model undergoes ethical fine-tuning, where human feedback is replaced by feedback generated from a large language model (LLM). The LLM embodies consequentialist, deontological, virtue, social justice, and care ethics as moral principles to assign belief values to recommended actions during ethical decision-making. An ethical layer aggregates belief scores from multiple LLM-derived moral perspectives using Belief Jensen-Shannon Divergence and Dempster-Shafer Theory into probability scores that also serve as the shaping reward, steering the agent toward choices that align with a balanced ethical framework. This integrated learning framework helps the RL agent navigate moral uncertainty in complex environments and enables it to make morally sound decisions across diverse tasks. Our approach, tested across different LLM variants and compared with other belief aggregation techniques, demonstrates improved consistency, adaptability, and reduced reliance on handcrafted ethical rewards. This method is especially effective in dynamic scenarios where ethical challenges arise unexpectedly, making it well-suited for real-world applications.

Problem

Research questions and friction points this paper is trying to address.

Addressing moral uncertainty in AI decision-making using LLMs

Integrating multiple ethical frameworks into reinforcement learning agents

Reducing reliance on handcrafted rewards for ethical AI behavior

Innovation

Methods, ideas, or system contributions that make the work stand out.

Ethical fine-tuning replaces human feedback with LLM

LLM embodies multiple moral principles for decision-making

Ethical layer aggregates beliefs using divergence and probability scores

🔎 Similar Papers

A Survey on Moral Foundation Theory and Pre-Trained Language Models: Current Advances and Challenges