🤖 AI Summary
To address the clinical burden and elevated diagnostic error risk caused by lengthy EHR narratives, as well as the limited interpretability of LLM-based diagnosis, this paper proposes DR.KNOWS—a knowledge-enhanced diagnostic reasoning system integrating the UMLS knowledge graph with large language models. It features a two-stage knowledge retrieval mechanism: (1) a Stacked Graph Isomorphism Network (SGIN) for graph-structure-aware semantic matching; and (2) an attention-based path ranker that generates clinically interpretable reasoning paths. We further design a human evaluation framework centered on clinical safety and diagnostic reasoning alignment. On two real-world EHR datasets, prompt-tuned T5 outperforms QuickUMLS and mainstream LLMs significantly (ROUGE-L +12.3%, CUI F1 +18.7%). Human evaluation confirms that DR.KNOWS’s reasoning trajectories align with established clinical reasoning norms.
📝 Abstract
BACKGROUND
Electronic health records (EHRs) and routine documentation practices play a vital role in patients' daily care, providing a holistic record of health, diagnoses, and treatment. However, complex and verbose EHR narratives can overwhelm health care providers, increasing the risk of diagnostic inaccuracies. While large language models (LLMs) have showcased their potential in diverse language tasks, their application in health care must prioritize the minimization of diagnostic errors and the prevention of patient harm. Integrating knowledge graphs (KGs) into LLMs offers a promising approach because structured knowledge from KGs could enhance LLMs' diagnostic reasoning by providing contextually relevant medical information.
OBJECTIVE
This study introduces DR.KNOWS (Diagnostic Reasoning Knowledge Graph System), a model that integrates Unified Medical Language System-based KGs with LLMs to improve diagnostic predictions from EHR data by retrieving contextually relevant paths aligned with patient-specific information.
METHODS
DR.KNOWS combines a stack graph isomorphism network for node embedding with an attention-based path ranker to identify and rank knowledge paths relevant to a patient's clinical context. We evaluated DR.KNOWS on 2 real-world EHR datasets from different geographic locations, comparing its performance to baseline models, including QuickUMLS and standard LLMs (Text-to-Text Transfer Transformer and ChatGPT). To assess diagnostic reasoning quality, we designed and implemented a human evaluation framework grounded in clinical safety metrics.
RESULTS
DR.KNOWS demonstrated notable improvements over baseline models, showing higher accuracy in extracting diagnostic concepts and enhanced diagnostic prediction metrics. Prompt-based fine-tuning of Text-to-Text Transfer Transformer with DR.KNOWS knowledge paths achieved the highest ROUGE-L (Recall-Oriented Understudy for Gisting Evaluation-Longest Common Subsequence) and concept unique identifier F1-scores, highlighting the benefits of KG integration. Human evaluators found the diagnostic rationales of DR.KNOWS to be aligned strongly with correct clinical reasoning, indicating improved abstraction and reasoning. Recognized limitations include potential biases within the KG data, which we addressed by emphasizing case-specific path selection and proposing future bias-mitigation strategies.
CONCLUSIONS
DR.KNOWS offers a robust approach for enhancing diagnostic accuracy and reasoning by integrating structured KG knowledge into LLM-based clinical workflows. Although further work is required to address KG biases and extend generalizability, DR.KNOWS represents progress toward trustworthy artificial intelligence-driven clinical decision support, with a human evaluation framework focused on diagnostic safety and alignment with clinical standards.