🤖 AI Summary
Current static imaging phenotypes struggle to capture the dynamic progression of diseases, and modeling multi-organ longitudinal imaging data remains challenging due to limited sample sizes. This work proposes a trajectory-aware distillation framework that, for the first time, leverages population-level disease trajectories learned from electronic health records as structural priors. Knowledge from these trajectories is transferred to a multi-organ imaging encoder via geometry-preserving alignment and fused with imaging representations through a cross-attention mechanism. Evaluated on UK Biobank data encompassing 159 diseases, the method significantly improves both AUC and time-to-onset prediction accuracy (measured by MAE), with particularly pronounced gains for low-prevalence conditions. The high consistency between imaging and trajectory embedding spaces further validates the effectiveness of cross-modal structural alignment.
📝 Abstract
Imaging-derived phenotypes (IDPs) summarize multi-organ physiology but provide only static snapshots of diseases that evolve over time. In contrast, longitudinal electronic health records encode disease trajectories through temporal dependencies among past diagnosis events and comorbidity structure. We hypothesize that IDPs and disease trajectories contain partially shared disease-relevant structure. We propose a trajectory-aware distillation framework that transfers structural knowledge from a generative disease trajectory Transformer into an organ-wise IDP encoder. A population-scale trajectory model trained on longitudinal diagnosis sequences produces subject-level embeddings that supervise IDP representation learning via geometry-preserving alignment. During downstream prediction, trajectory and imaging representations can also be fused via cross-attention. Across 159 diseases in the UK Biobank cohort, trajectory-aware pretraining consistently improves both discrimination (AUC) and time-to-onset prediction (MAE), with the largest gains for low-prevalence diseases. Similarity relationships in IDP embedding space also align with those in trajectory space, providing supportive evidence for partially aligned representation geometry. These results suggest that population-scale generative disease models can serve as structural priors for data-limited imaging modalities, improving robustness under realistic cohort constraints.