🤖 AI Summary
This work investigates the alignment of large language models (LLMs) with human moral values in high-stakes ethical decision-making—specifically, scarce organ allocation (e.g., kidneys). We find that mainstream LLMs systematically deviate from human preferences in attribute prioritization and overwhelmingly avoid expressing indecision, rarely employing ethically appropriate strategies such as refusal or randomization under moral uncertainty. To address this, we propose a few-shot low-rank adaptation (LoRA)-based supervised fine-tuning method that jointly optimizes decision consistency and uncertainty-aware calibration—without compromising task performance. Experiments demonstrate significant improvements in alignment with human preference rankings and faithful indecision expression. Our approach is the first to systematically identify and mitigate the “determinism bias” inherent in LLMs’ moral reasoning, establishing a novel paradigm for ethically aligned, trustworthy AI.
📝 Abstract
The rapid integration of Large Language Models (LLMs) in high-stakes decision-making -- such as allocating scarce resources like donor organs -- raises critical questions about their alignment with human moral values. We systematically evaluate the behavior of several prominent LLMs against human preferences in kidney allocation scenarios and show that LLMs: i) exhibit stark deviations from human values in prioritizing various attributes, and ii) in contrast to humans, LLMs rarely express indecision, opting for deterministic decisions even when alternative indecision mechanisms (e.g., coin flipping) are provided. Nonetheless, we show that low-rank supervised fine-tuning with few samples is often effective in improving both decision consistency and calibrating indecision modeling. These findings illustrate the necessity of explicit alignment strategies for LLMs in moral/ethical domains.