Inference of Dependency Knowledge Graph for Electronic Health Records

📅 2023-12-25

📈 Citations: 3

✨ Influential: 0

career value

176K/year

🤖 AI Summary

To address the lack of statistical guarantees in knowledge graph construction from high-dimensional sparse electronic health records (EHRs) and the privacy-induced scarcity of patient-level data, this paper proposes the first asymptotically normal theoretical framework for edge inference in sparse dependency knowledge graphs. Methodologically, it integrates low-rank temporal dependency modeling, dynamic logistic linear topic modeling, and singular value decomposition of pointwise mutual information matrices, coupled with entrywise asymptotic normality analysis to enable edge significance testing with controlled Type-I error. It innovatively bridges a critical theoretical gap by establishing asymptotic normality for nonlinear statistics in graph structure inference. Experiments demonstrate strict control of edge false discovery rates in simulations; on real EHR data, it successfully constructs interpretable clinical knowledge graphs and generates discriminative feature embeddings, substantially improving both statistical efficiency and clinical interpretability.

📝 Abstract

The effective analysis of high-dimensional Electronic Health Record (EHR) data, with substantial potential for healthcare research, presents notable methodological challenges. Employing predictive modeling guided by a knowledge graph (KG), which enables efficient feature selection, can enhance both statistical efficiency and interpretability. While various methods have emerged for constructing KGs, existing techniques often lack statistical certainty concerning the presence of links between entities, especially in scenarios where the utilization of patient-level EHR data is limited due to privacy concerns. In this paper, we propose the first inferential framework for deriving a sparse KG with statistical guarantee based on the dynamic log-linear topic model proposed by cite{arora2016latent}. Within this model, the KG embeddings are estimated by performing singular value decomposition on the empirical pointwise mutual information matrix, offering a scalable solution. We then establish entrywise asymptotic normality for the KG low-rank estimator, enabling the recovery of sparse graph edges with controlled type I error. Our work uniquely addresses the under-explored domain of statistical inference about non-linear statistics under the low-rank temporal dependent models, a critical gap in existing research. We validate our approach through extensive simulation studies and then apply the method to real-world EHR data in constructing clinical KGs and generating clinical feature embeddings.

Problem

Research questions and friction points this paper is trying to address.

Infer sparse knowledge graphs from EHR data with statistical guarantees

Address lack of statistical certainty in existing knowledge graph construction methods

Enable clinical feature selection and embedding under privacy constraints

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic log-linear topic model for KG inference

Singular value decomposition on PMI matrix

Entrywise asymptotic normality for sparse graph edges

🔎 Similar Papers

medIKAL: Integrating Knowledge Graphs as Assistants of LLMs for Enhanced Clinical Diagnosis on EMRs