🤖 AI Summary
Missing values in multivariate time series—particularly electronic health records (EHRs)—often arise from sensor disconnections, yet existing imputation methods neglect model uncertainty, hindering reliable identification of low-confidence estimates. Method: We propose the first general-purpose imputation framework integrating uncertainty quantification: it employs probabilistic modeling techniques (e.g., Monte Carlo Dropout) to compute per-value confidence scores and introduces an adaptive threshold–driven selective imputation mechanism that fills only missing entries with sufficiently high confidence. Contribution/Results: Evaluated across multiple real-world EHR datasets, our approach significantly reduces imputation error and improves downstream clinical prediction performance—e.g., 24-hour mortality prediction—demonstrating both the efficacy and clinical utility of uncertainty-aware imputation.
📝 Abstract
Time series data with missing values is common across many domains. Healthcare presents special challenges due to prolonged periods of sensor disconnection. In such cases, having a confidence measure for imputed values is critical. Most existing methods either overlook model uncertainty or lack mechanisms to estimate it. To address this gap, we introduce a general framework that quantifies and leverages uncertainty for selective imputation. By focusing on values the model is most confident in, highly unreliable imputations are avoided. Our experiments on multiple EHR datasets, covering diverse types of missingness, demonstrate that selectively imputing less-uncertain values not only reduces imputation errors but also improves downstream tasks. Specifically, we show performance gains in a 24-hour mortality prediction task, underscoring the practical benefit of incorporating uncertainty into time series imputation.