🤖 AI Summary
Existing large medical language models largely overlook the clinical utility of electronic health records (EHRs), focusing narrowly on diagnostic recommendations and thus suffering from limited practical applicability. To address this, we propose the first multi-task clinical dialogue system that deeply integrates heterogeneous EHR data to support laboratory test recommendation, result interpretation, and diagnosis prediction. Methodologically, we introduce a Clinical Test Reference (CTR) strategy—combining rejection sampling, class-sensitive reward shaping, and confirmation-based rewards—to enable EHR-grounded, precise diagnostic modeling. Additionally, we design a reinforcement learning framework tailored for large-scale action spaces, incorporating clinical code mapping, lab result classification, and efficient exploration strategies. Experiments demonstrate significant improvements over strong baselines in both test recommendation and diagnosis prediction, achieving superior clinical utility and diagnostic accuracy.
📝 Abstract
Recent advances in Large Language Models (LLMs) have led to remarkable progresses in medical consultation. However, existing medical LLMs overlook the essential role of Electronic Health Records (EHR) and focus primarily on diagnosis recommendation, limiting their clinical applicability. We propose DiaLLM, the first medical LLM that integrates heterogeneous EHR data into clinically grounded dialogues, enabling clinical test recommendation, result interpretation, and diagnosis prediction to better align with real-world medical practice. To construct clinically grounded dialogues from EHR, we design a Clinical Test Reference (CTR) strategy that maps each clinical code to its corresponding description and classifies test results as "normal" or "abnormal". Additionally, DiaLLM employs a reinforcement learning framework for evidence acquisition and automated diagnosis. To handle the large action space, we introduce a reject sampling strategy to reduce redundancy and improve exploration efficiency. Furthermore, a confirmation reward and a class-sensitive diagnosis reward are designed to guide accurate diagnosis prediction. Extensive experimental results demonstrate that DiaLLM outperforms baselines in clinical test recommendation and diagnosis prediction.