Bridging Stepwise Lab-Informed Pretraining and Knowledge-Guided Learning for Diagnostic Reasoning

📅 2024-10-25
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Existing EHR-driven diagnostic models lack physician-like stepwise reasoning and interpretability. Method: We propose DuaLK, a dual-expert framework comprising (1) an LLM-enhanced diagnostic knowledge graph that integrates structured medical knowledge with semantic relations, and (2) a laboratory-test-guided stepwise pretraining task that explicitly models clinical decision pathways. The method unifies knowledge graph construction, LLM-based semantic alignment, lab-signal-driven proxy tasks, and multi-task diagnostic prediction. Results: On four clinical prediction tasks across two public EHR datasets, DuaLK consistently outperforms state-of-the-art baselines, achieving significant improvements in predictive accuracy (average +3.2% AUC) and reasoning interpretability (86.4% inter-annotator agreement in human evaluation). This work establishes a novel paradigm for knowledge-augmented, clinically grounded AI reasoning.

Technology Category

Application Category

📝 Abstract
Despite the growing use of Electronic Health Records (EHR) for AI-assisted diagnosis prediction, most data-driven models struggle to incorporate clinically meaningful medical knowledge. They often rely on limited ontologies, lacking structured reasoning capabilities and comprehensive coverage. This raises an important research question: Will medical knowledge improve predictive models to support stepwise clinical reasoning as performed by human doctors? To address this problem, we propose DuaLK, a dual-expertise framework that combines two complementary sources of information. For external knowledge, we construct a Diagnosis Knowledge Graph (KG) that encodes both hierarchical and semantic relations enriched by large language models (LLM). To align with patient data, we further introduce a lab-informed proxy task that guides the model to follow a clinically consistent, stepwise reasoning process based on lab test signals. Experimental results on two public EHR datasets demonstrate that DuaLK consistently outperforms existing baselines across four clinical prediction tasks. These findings highlight the potential of combining structured medical knowledge with individual-level clinical signals to achieve more accurate and interpretable diagnostic predictions. The source code is publicly available on https://github.com/humphreyhuu/DuaLK.
Problem

Research questions and friction points this paper is trying to address.

Incorporating medical knowledge into AI diagnostic models
Enhancing structured reasoning in EHR-based predictions
Aligning lab data with clinical reasoning processes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-expertise framework combines knowledge and data
Diagnosis Knowledge Graph enriched by large language models
Lab-informed proxy task guides stepwise clinical reasoning
🔎 Similar Papers
P
Pengfei Hu
Department of Computer Science, Stevens Institute of Technology
C
Chang Lu
Department of Computer Science, Stevens Institute of Technology
F
Fei Wang
Department of Population Health Sciences, Weill Cornell Medicine
Yue Ning
Yue Ning
Stevens Institute of Technology
Data AnalyticsKnowledge DiscoveryMachine Learning