🤖 AI Summary
This study addresses the challenge of forecasting future chest X-ray images to quantify longitudinal disease progression, thereby supporting clinical decision-making and dynamic patient monitoring. Methodologically, it introduces the first latent diffusion model that explicitly incorporates structured temporal clinical events—such as laboratory values and medication records—as conditional signals; proposes a multimodal conditional encoder to jointly process chest X-rays and electronic health record (EHR) sequences; and integrates time-aware attention to achieve dynamic temporal alignment across modalities. Quantitative and qualitative evaluations demonstrate that the generated images significantly outperform existing baselines in clinical consistency, demographic consistency, and visual fidelity. This work establishes a novel, interpretable, and clinically verifiable paradigm for future medical image prediction, enabling visualization of disease evolution and facilitating early risk stratification.
📝 Abstract
Chest X-ray (CXR) is an important diagnostic tool widely used in hospitals to assess patient conditions and monitor changes over time. Recently, generative models, specifically diffusion-based models, have shown promise in generating realistic synthetic CXRs. However, these models mainly focus on conditional generation using single-time-point data, i.e., generating CXRs conditioned on their corresponding reports from a specific time. This limits their clinical utility, particularly for capturing temporal changes. To address this limitation, we propose a novel framework, EHRXDiff, which predicts future CXR images by integrating previous CXRs with subsequent medical events, e.g., prescriptions, lab measures, etc. Our framework dynamically tracks and predicts disease progression based on a latent diffusion model, conditioned on the previous CXR image and a history of medical events. We comprehensively evaluate the performance of our framework across three key aspects, including clinical consistency, demographic consistency, and visual realism. Results show that our framework generates high-quality, realistic future images that effectively capture potential temporal changes. This suggests that our framework could be further developed to support clinical decision-making and provide valuable insights for patient monitoring and treatment planning in the medical field. The code is available at https://github.com/dek924/EHRXDiff.