🤖 AI Summary
This study addresses two key challenges in modeling longitudinal multimodal medical data: (1) static anatomical structures in sequential chest X-rays obscuring dynamic pathological changes, and (2) temporal asynchrony between sparsely and irregularly sampled electronic health records (EHR) and imaging data. To tackle these, we propose a region-aware spatiotemporal disentanglement mechanism that explicitly separates static anatomical features from dynamic pathological patterns in X-ray sequences. Additionally, we design a hierarchical temporal alignment framework that jointly models EHR and imaging data at both local interval-level and global sequence-level granularity. Evaluated on the MIMIC-CXR and MIMIC-IV datasets, our method significantly improves disease progression identification and critical illness prediction, achieving AUC gains of 3.2–5.7 percentage points over baselines. The approach yields a clinically interpretable, robust, and explainable paradigm for longitudinal multimodal medical data analysis.
📝 Abstract
Longitudinal multimodal data, including electronic health records (EHR) and sequential chest X-rays (CXRs), is critical for modeling disease progression, yet remains underutilized due to two key challenges: (1) redundancy in consecutive CXR sequences, where static anatomical regions dominate over clinically-meaningful dynamics, and (2) temporal misalignment between sparse, irregular imaging and continuous EHR data. We introduce $ exttt{DiPro}$, a novel framework that addresses these challenges through region-aware disentanglement and multi-timescale alignment. First, we disentangle static (anatomy) and dynamic (pathology progression) features in sequential CXRs, prioritizing disease-relevant changes. Second, we hierarchically align these static and dynamic CXR features with asynchronous EHR data via local (pairwise interval-level) and global (full-sequence) synchronization to model coherent progression pathways. Extensive experiments on the MIMIC dataset demonstrate that $ exttt{DiPro}$ could effectively extract temporal clinical dynamics and achieve state-of-the-art performance on both disease progression identification and general ICU prediction tasks.