🤖 AI Summary
Existing EHR-driven diagnostic models lack physician-like stepwise reasoning and interpretability. Method: We propose DuaLK, a dual-expert framework comprising (1) an LLM-enhanced diagnostic knowledge graph that integrates structured medical knowledge with semantic relations, and (2) a laboratory-test-guided stepwise pretraining task that explicitly models clinical decision pathways. The method unifies knowledge graph construction, LLM-based semantic alignment, lab-signal-driven proxy tasks, and multi-task diagnostic prediction. Results: On four clinical prediction tasks across two public EHR datasets, DuaLK consistently outperforms state-of-the-art baselines, achieving significant improvements in predictive accuracy (average +3.2% AUC) and reasoning interpretability (86.4% inter-annotator agreement in human evaluation). This work establishes a novel paradigm for knowledge-augmented, clinically grounded AI reasoning.
📝 Abstract
Despite the growing use of Electronic Health Records (EHR) for AI-assisted diagnosis prediction, most data-driven models struggle to incorporate clinically meaningful medical knowledge. They often rely on limited ontologies, lacking structured reasoning capabilities and comprehensive coverage. This raises an important research question: Will medical knowledge improve predictive models to support stepwise clinical reasoning as performed by human doctors? To address this problem, we propose DuaLK, a dual-expertise framework that combines two complementary sources of information. For external knowledge, we construct a Diagnosis Knowledge Graph (KG) that encodes both hierarchical and semantic relations enriched by large language models (LLM). To align with patient data, we further introduce a lab-informed proxy task that guides the model to follow a clinically consistent, stepwise reasoning process based on lab test signals. Experimental results on two public EHR datasets demonstrate that DuaLK consistently outperforms existing baselines across four clinical prediction tasks. These findings highlight the potential of combining structured medical knowledge with individual-level clinical signals to achieve more accurate and interpretable diagnostic predictions. The source code is publicly available on https://github.com/humphreyhuu/DuaLK.