🤖 AI Summary
This work addresses the clinical reliability of missing value imputation in electronic health record (EHR) time series. We systematically evaluate diverse deep temporal models—including RNNs, Transformers, and GNN variants—assessing their capacity to capture clinically meaningful temporal dependencies. We propose a medical-prior-informed bias analysis framework that reveals how architectural choices and implementation details (e.g., preprocessing) jointly affect clinical dependency modeling. Contrary to common assumptions, increased model complexity does not guarantee improved performance; lightweight, domain-adapted architectures significantly outperform parameter-heavy baselines. Experiments show preprocessing variations induce up to 20% performance fluctuation, underscoring the need for standardized benchmarks and tighter integration of clinical knowledge. Our key contributions are: (1) establishing a clinical-meaning-first evaluation paradigm over statistical accuracy alone; (2) identifying fundamental limitations in current methods—poor interpretability, fragility under sparse sampling, and inability to capture event-driven dynamics; and (3) providing a reproducible, interpretable, and clinically trustworthy methodology for medical time-series imputation.
📝 Abstract
We present a comprehensive analysis of deep learning approaches for Electronic Health Record (EHR) time-series imputation, examining how architectural and framework biases combine to influence model performance. Our investigation reveals varying capabilities of deep imputers in capturing complex spatiotemporal dependencies within EHRs, and that model effectiveness depends on how its combined biases align with medical time-series characteristics. Our experimental evaluation challenges common assumptions about model complexity, demonstrating that larger models do not necessarily improve performance. Rather, carefully designed architectures can better capture the complex patterns inherent in clinical data. The study highlights the need for imputation approaches that prioritise clinically meaningful data reconstruction over statistical accuracy. Our experiments show imputation performance variations of up to 20% based on preprocessing and implementation choices, emphasising the need for standardised benchmarking methodologies. Finally, we identify critical gaps between current deep imputation methods and medical requirements, highlighting the importance of integrating clinical insights to achieve more reliable imputation approaches for healthcare applications.