CLR-voyance: Reinforcing Open-Ended Reasoning for Inpatient Clinical Decision Support with Outcome-Aware Rubrics

📅 2026-05-10
📈 Citations: 0
Influential: 0
📄 PDF

career value

171K/year
🤖 AI Summary
This work addresses the challenge of clinical decision-making in inpatient settings, where partial observability complicates reasoning. Existing approaches often rely on closed-set retrieval or lack clinically verifiable scoring mechanisms. To overcome these limitations, the study formulates the task as a partially observable Markov decision process (POMDP) and introduces the first dynamic scoring system tailored to inpatient care, grounded in real patient outcomes and validated by clinicians. A rigorous policy evaluation framework is designed to strictly separate historical from future information. Leveraging GRPO-based post-training on Qwen3-8B and MedGemma-4B, combined with model fusion and physician preference annotations, the resulting model—CLR-voyance-8B—achieves 84.91% on the CLR-POMDP benchmark, significantly outperforming GPT-5 (77.83%) and MedGemma-27B (66.66%), while maintaining state-of-the-art performance on general medical benchmarks. The system has been stably deployed in partner hospitals for over six months.
📝 Abstract
Inpatient clinical reasoning is a sequential decision under partial observability: the clinician sees the admission so far and must choose the next action whose downstream consequences are not yet visible. Existing clinical-LLM evaluations and RL rewards signals collapse this into closed-form retrieval, clinical journey leakage, or unanchored LLM-as-judge scoring. We introduce CLR-voyance, a framework that reformulates inpatient reasoning as a Partially Observable Markov Decision Process (POMDP) and supervises it with rewards that are simultaneously outcome-grounded and clinician-validated. We instantiate the formulation as CLR-POMDP, which partitions successful patient journeys into a policy-visible past and an oracle-only future. Using the past information, an oracle LLM generates a case-specific query-answer pair, and the first adaptive rubric for clinical reasoning which is verifiable in the future of the patient journey. These rubrics are used for both post-training and evaluation of models for inpatient clinical reasoning. We post-train Qwen3-8B and MedGemma-4B with GRPO followed by model merging, yielding state-of-the-art inpatient clinical reasoning while retaining generalist capabilities. CLR-voyance-8B achieves 84.91% on CLR-POMDP, ahead of frontier medical reasoning models like GPT-5 (77.83%) and MedGemma-27B (66.66%) and has comparable or better performance on existing medical benchmarks. To ensure a clinically meaningful setting, we conduct a large-scale clinician alignment study, where physicians curate per-case rubrics, grade candidate responses, and provide blinded pairwise preferences of model reasoning. This study provides insights on clinical LLM-as-a-judge and clinical preference-model selection, which can inform the community at large. CLR-voyance has been deployed for 6+ months at a partner public hospital, drafting thousands of reasoning-heavy inpatient notes.
Problem

Research questions and friction points this paper is trying to address.

clinical reasoning
partially observable decision making
outcome-aware evaluation
inpatient decision support
open-ended reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

POMDP
outcome-aware rubrics
clinical reasoning
GRPO
clinician alignment
🔎 Similar Papers
No similar papers found.
Aishik Nagar
Aishik Nagar
Machine Learning Engineer, ASUS Intelligent Cloud Services (AICS)
AI for clinical careAI for healthcareCognitive AIEmbodied AIMultimodal AI
Arun-Kumar Kaliya-Perumal
Arun-Kumar Kaliya-Perumal
Nanyang Technological University, Singapore
Musculoskeletal HealthOrthopaedic Spine SurgeryDisease ModelingGenetics
Y
Yu-Hsuan Han
Department of Family Medicine, Taipei Veterans General Hospital
A
Andrew Sheng-Han Huang
School of Medicine, National Yang Ming Chiao Tung University
K
Kristen Kee
Yong Loo Lin School of Medicine, National University of Singapore
Yushi Cao
Yushi Cao
Nanyang Technological University
Deep Reinforcement LearningTrustworthy AI
Y
Yiming Chen
ASUS Intelligent Cloud Services (AICS)
Hongchao Jiang
Hongchao Jiang
Research Fellow, Alibaba-NTU Joint Research Institute