🤖 AI Summary
This work addresses the widespread neglect of social determinants of health (SDoH) in existing disease prediction generative models, which limits their capacity for personalized clinical reasoning. We propose the first conditional latent diffusion framework that integrates SDoH-informed digital twins with multimodal longitudinal generative modeling. SDoH are explicitly modeled via proxy variables derived from ICD codes and jointly integrated with multi-organ sensor data and electronic health record sequences, enabling unified cross-modal representation across imaging, graph, and tabular data. A novel geometric diffusion mechanism is introduced to capture complex temporal structures such as brain networks, facilitating disease trajectory simulation and counterfactual intervention reasoning. Evaluated on large-scale UK Biobank data, our model significantly outperforms current autoregressive and imaging-based generative baselines across multiple organ systems, including the brain, heart, liver, and kidneys.
📝 Abstract
Despite the central role of sensor-derived measurements such as imaging traits and plasma biomarkers in biomedical research and clinical practice, existing generative models for disease prediction largely depend on event-level representations from hospital and registry data. Given the multi-factorial nature of human disease, the absence of explicit modeling of social determinants of health (SDoH), even in the limited form of ICD-coded proxies (chapters Z and V--Y in ICD-10), limits the capacity for personalized disease modeling and clinical decision support. To address this limitation, we propose a generative model with ICD-coded proxies of SDoH for \textit{in silico} modeling of disease reasoning, a conditioned latent diffusion framework that establishes the connection between multi-organ sensor data with tokenized healthcare events. Specifically, we introduce a novel geometric diffusion model to characterize the temporal evolution of complex data representation such as brain networks (region-to-region connectivity encoded in a graph), in parallel with diffusion models for tabular data from other organ systems. Together, we integrate the generative model with digitalized SDoH proxies (coined \modelname{}) for simulated intervention and reasoning of future disease trajectories. We conduct extensive experiments on the UK Biobank (UKB) dataset, which contains organ-specific imaging traits, including brain (44,834), heart (23,987), liver (28,722), and kidney (32,155), along with nearly 500k medical history sequences (age range: 25$\sim$89 years). Our \modelname{} achieves significant improvements over state-of-the-art human disease autoregressive models and imaging trait generative baselines.