SafeMed-R1: Clinician-Audited Safety and Ethics Alignment for Medical Large Language Models

📅 2026-05-27

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

While current medical large language models demonstrate strong performance on examinations, their clinical deployment remains hindered by insufficient auditable reasoning, inadequate alignment with safety and ethical standards, and limited robustness against adversarial misuse. This work proposes SafeMed-R1, the first framework to incorporate a supervised provenance mechanism grounded in Clinical Trust Signals (CTS), which links each model inference to physician ratings and editorial histories. It further integrates domain-specific safety-aligned fine-tuning and red-teaming evaluations. Notably, this approach provides governance-ready evidence without requiring retrieval during inference. Experimental results show that SafeMed-R1 achieves a 79.6% macro-averaged accuracy on clinical benchmarks, reduces unsafe outputs by 3–5% under adversarial testing, and matches the medical correctness of PGY1/2 residents across 30 medication safety cases—while outperforming them in medication safety, guideline adherence, and clinical utility.

📝 Abstract

Large language models(LLMs) increasingly match expert performance on licensing examinations, yet routine clinical use remains limited because governance requires auditable reasoning, safety and ethics alignment, and resilience to adversarial misuse. Here we present SafeMed-R1, trained with a traceable Clinical Trust Signals(CTS) pipeline that links each reasoning instance to clinician rubric scores and edit histories, and aligned through safety and ethics supervision and red team stress testing. SafeMed-R1 attains a macro-averaged accuracy of 79.6% across clinical benchmarks. Under adversarial safety testing, it shows the lowest aggregated risk and reduces unsafe outputs by about 3 to 5% relative to its baseline. In a paired expert study of 30 medication safety vignettes, SafeMed-R1 matches PGY1 and PGY2 residents on medical correctness and scores higher for medication safety, guideline consistency, and clinical usefulness. Collectively, these results suggest that clinician-audited supervision provenance, together with domain-tailored safety and ethics alignment, can strengthen governance-relevant evidence without relying on inference-time retrieval or citation grounding.

Problem

Research questions and friction points this paper is trying to address.

medical large language models

safety alignment

ethics alignment

adversarial misuse

auditable reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Clinical Trust Signals

safety and ethics alignment

adversarial red teaming