PRISM: Mitigating EHR Data Sparsity via Learning from Missing Feature Calibrated Prototype Patient Representations

📅 2023-09-08

🏛️ International Conference on Information and Knowledge Management

📈 Citations: 2

✨ Influential: 0

career value

157K/year

🤖 AI Summary

EHR data are highly sparse, leading to distorted patient representations; conventional imputation methods fail to distinguish between observed and imputed values, thereby degrading predictive performance. To address this, we propose a prototype-driven missingness-aware representation framework that avoids direct imputation. Instead, it indirectly models missing features by leveraging trustworthy prototypes derived from clinically similar patients. We introduce a novel missingness-aware similarity metric and a feature confidence learning module that dynamically estimates the reliability of each clinical feature, thus eliminating reliance on spurious imputations. Extensive evaluation across four multi-center datasets—MIMIC-III, MIMIC-IV, PhysioNet 2012, and eICU—demonstrates significant improvements over state-of-the-art imputation and representation methods in predicting in-hospital mortality and 30-day readmission. Our implementation is publicly available.

📝 Abstract

Electronic Health Records (EHRs) contain a wealth of patient data; however, the sparsity of EHRs data often presents significant challenges for predictive modeling. Conventional imputation methods inadequately distinguish between real and imputed data, leading to potential inaccuracies of patient representations. To address these issues, we introduce PRISM, a framework that indirectly imputes data by leveraging prototype representations of similar patients, thus ensuring compact representations that preserve patient information. PRISM also includes a feature confidence learner module, which evaluates the reliability of each feature considering missing statuses. Additionally, PRISM introduces a new patient similarity metric that accounts for feature confidence, avoiding over-reliance on imprecise imputed values. Our extensive experiments on the MIMIC-III, MIMIC-IV, PhysioNet Challenge 2012, eICU datasets demonstrate PRISM's superior performance in predicting in-hospital mortality and 30-day readmission tasks, showcasing its effectiveness in handling EHR data sparsity. For the sake of reproducibility and further research, we have publicly released the code at https://github.com/yhzhu99/PRISM.

Problem

Research questions and friction points this paper is trying to address.

Mitigating EHR data sparsity

Improving patient representation accuracy

Enhancing predictive modeling in healthcare

Innovation

Methods, ideas, or system contributions that make the work stand out.

Prototype patient representations

Feature confidence learner

New patient similarity metric

🔎 Similar Papers

Learnable Prompt as Pseudo-Imputation: Rethinking the Necessity of Traditional EHR Data Imputation in Downstream Clinical Prediction