Knowledge Enhanced Conditional Imputation for Healthcare Time-series

📅 2023-12-27
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
To address complex, non-random missingness in multivariate time series within electronic health records (EHRs), this paper proposes the Conditional Self-Attention Imputation (CSAI) model. Methodologically, CSAI introduces three key innovations: (1) an attention-driven latent state initialization mechanism; (2) clinically inspired temporal decay modeling to capture physiological plausibility; and (3) a non-uniform masking strategy that jointly encodes temporal dynamics and cross-sectional correlations—marking the first imputation framework explicitly aligned with real-world clinical data patterns. Extensive experiments across four benchmark EHR datasets demonstrate that CSAI consistently outperforms state-of-the-art methods in both reconstruction fidelity and downstream diagnostic prediction tasks. Furthermore, the model has been integrated into the open-source time-series analysis library PyPOTS, enabling reproducible clinical time-series imputation research.
📝 Abstract
We introduce the Conditional Self-Attention Imputation (CSAI) model, a novel recurrent neural network architecture designed to address the challenges of complex missing data patterns in multivariate time series derived from hospital electronic health records (EHRs). CSAI extends state-of-the-art neural network-based imputation by introducing key modifications specific to EHR data: a) attention-based hidden state initialisation to capture both long- and short-range temporal dependencies prevalent in EHRs, b) domain-informed temporal decay to mimic clinical data recording patterns, and c) a non-uniform masking strategy that models non-random missingness by calibrating weights according to both temporal and cross-sectional data characteristics. Comprehensive evaluation across four EHR benchmark datasets demonstrates CSAI's effectiveness compared to state-of-the-art architectures in data restoration and downstream tasks. CSAI is integrated into PyPOTS, an open-source Python toolbox designed for machine learning tasks on partially observed time series. This work significantly advances the state of neural network imputation applied to EHRs by more closely aligning algorithmic imputation with clinical realities.
Problem

Research questions and friction points this paper is trying to address.

Imputes missing data in hospital EHR time series
Handles complex non-random missingness patterns in clinical data
Improves data restoration for downstream healthcare analytics tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Conditional self-attention imputation for time-series data
Attention-based initialization for temporal dependencies
Domain-informed decay and non-uniform masking strategy
🔎 Similar Papers
No similar papers found.
L
Linglong Qian
Department of Biostatistics and Health Informatics, King’s College London, London UK
Z
Zina M. Ibrahim
Department of Biostatistics and Health Informatics, King’s College London, London UK
H
Hugh Logan Ellis
Department of Biostatistics and Health Informatics, King’s College London, London UK
Ao Zhang
Ao Zhang
Northwestern Polytechnical University
keyword spottingAutomatic Speech Recognition
Y
Yuezhou Zhang
Department of Biostatistics and Health Informatics, King’s College London, London UK
T
Tao Wang
Department of Biostatistics and Health Informatics, King’s College London, London UK
R
Richard J. B. Dobson
Department of Biostatistics and Health Informatics, King’s College London, London UK