What Does the AI Doctor Value? Auditing Pluralism in the Clinical Ethics of Language Models

📅 2026-05-18

📈 Citations: 0

✨ Influential: 0

career value

157K/year

🤖 AI Summary

Current large language models in healthcare lack systematic evaluation of ethical value pluralism, risking the imposition of a singular value orientation that obscures the diversity inherent in clinical practice. This work proposes the first evaluative framework for value pluralism in medical AI, integrating a clinician-validated benchmark of ethical dilemmas, decision attribution analysis, multi-round sampling, and semantic variant testing, along with a novel method to quantify value prioritization. The study reveals that while mainstream models broadly capture the heterogeneity of clinicians’ ethical values, certain models significantly underweight patient autonomy, thereby posing a risk of amplifying a narrow ethical bias upon deployment. These findings underscore the critical need for systematic auditing of value alignment in medical AI systems.

📝 Abstract

Medicine is inherently pluralistic. Principles such as autonomy, beneficence, nonmaleficence, and justice routinely conflict, and such ethical dilemmas often sharply divide reasonable physicians. Good clinical practice navigates these tensions in concert with each patient's values rather than imposing a single ethical stance. The ethical values that large language models bring to medical advice, however, have not been systematically examined. We present a framework for auditing value pluralism in medical AI, comprising a benchmark of clinician-verified dilemmas and an attribution method that recovers value priorities directly from decisions. The ecosystem of frontier models spans physician-level value heterogeneity, and models discuss competing values in their reasoning (Overton pluralism) before committing to a decision. However, individual model decisions are near-deterministic across repeated sampling and semantic variations, failing to reproduce the distributional pluralism of the physician panel. Across benchmark cases, these consistent decisions reflect committed, systematic value preferences. While most model priorities fall within the natural range of inter-physician variation, some significantly underweight patient autonomy. A single LLM deployed without regard for its value priorities could amplify those priorities at scale to every patient it serves. Without explicit efforts to balance ethical perspectives with one or multiple models, these tools risk replacing clinical pluralism with a deployment monoculture.

Problem

Research questions and friction points this paper is trying to address.

value pluralism

clinical ethics

large language models

medical AI

ethical dilemmas

Innovation

Methods, ideas, or system contributions that make the work stand out.

value pluralism

clinical ethics

large language models