SafeMed-R1: Clinician-Audited Safety and Ethics Alignment for Medical Large Language Models

πŸ“… 2026-05-27
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
While current medical large language models demonstrate strong performance on examinations, their clinical deployment remains hindered by insufficient auditable reasoning, inadequate alignment with safety and ethical standards, and limited robustness against adversarial misuse. This work proposes SafeMed-R1, the first framework to incorporate a supervised provenance mechanism grounded in Clinical Trust Signals (CTS), which links each model inference to physician ratings and editorial histories. It further integrates domain-specific safety-aligned fine-tuning and red-teaming evaluations. Notably, this approach provides governance-ready evidence without requiring retrieval during inference. Experimental results show that SafeMed-R1 achieves a 79.6% macro-averaged accuracy on clinical benchmarks, reduces unsafe outputs by 3–5% under adversarial testing, and matches the medical correctness of PGY1/2 residents across 30 medication safety casesβ€”while outperforming them in medication safety, guideline adherence, and clinical utility.
πŸ“ Abstract
Large language models(LLMs) increasingly match expert performance on licensing examinations, yet routine clinical use remains limited because governance requires auditable reasoning, safety and ethics alignment, and resilience to adversarial misuse. Here we present SafeMed-R1, trained with a traceable Clinical Trust Signals(CTS) pipeline that links each reasoning instance to clinician rubric scores and edit histories, and aligned through safety and ethics supervision and red team stress testing. SafeMed-R1 attains a macro-averaged accuracy of 79.6% across clinical benchmarks. Under adversarial safety testing, it shows the lowest aggregated risk and reduces unsafe outputs by about 3 to 5% relative to its baseline. In a paired expert study of 30 medication safety vignettes, SafeMed-R1 matches PGY1 and PGY2 residents on medical correctness and scores higher for medication safety, guideline consistency, and clinical usefulness. Collectively, these results suggest that clinician-audited supervision provenance, together with domain-tailored safety and ethics alignment, can strengthen governance-relevant evidence without relying on inference-time retrieval or citation grounding.
Problem

Research questions and friction points this paper is trying to address.

medical large language models
safety alignment
ethics alignment
adversarial misuse
auditable reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Clinical Trust Signals
safety and ethics alignment
adversarial red teaming
auditable reasoning
medical LLM governance
πŸ”Ž Similar Papers
No similar papers found.
C
Chao Ding
Shanghai Artificial Intelligence Laboratory, Shanghai, China
M
Mouxiao Bian
Shanghai Artificial Intelligence Laboratory, Shanghai, China
Tianbin Li
Tianbin Li
Shanghai Artificial Intelligence Laboratory
Machine LearningComputer VisionGeneral Intelligence
M
Minjia Yuan
Joint Laboratory of Biomedical Artificial Intelligence, Shanghai East Hospital, Tongji University School of Medicine, Shanghai, China
Y
Yidong Jiang
School of Computer Science and Technology, Tongji University, Shanghai, China
Yankai Jiang
Yankai Jiang
Shanghai AI Laboratory
Multimodal LLMVision-Language PretrainingAI for Science
J
Jinru Ding
Shanghai Artificial Intelligence Laboratory, Shanghai, China
J
Jiayuan Chen
Shanghai Artificial Intelligence Laboratory, Shanghai, China
Z
Zhuangzhi Gao
Shanghai Artificial Intelligence Laboratory, Shanghai, China
P
Pengcheng Chen
Shanghai Artificial Intelligence Laboratory, Shanghai, China; University of Washington, Washington, USA
Z
Zhao He
Department of Eye and Vision Sciences, University of Liverpool, Liverpool, United Kingdom; Liverpool Centre for Cardiovascular Science, University of Liverpool, Liverpool, United Kingdom
Rongzhao Zhang
Rongzhao Zhang
Shanghai AI Lab
Medical Image AnalysisComputer Vision
M
Meiling Liu
Shanghai Artificial Intelligence Laboratory, Shanghai, China
L
Luyi Jiang
Shanghai Institute of Infectious Disease and Biosecurity, Fudan University, Shanghai, China; Shanghai Health Development Research Center (Shanghai Medical Information Center), Shanghai, China
J
Jie Xu
Shanghai Artificial Intelligence Laboratory, Shanghai, China