Self-supervised learning of imaging and clinical signatures using a multimodal joint-embedding predictive architecture

πŸ“… 2025-09-18
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the dual challenges of scarce annotated data and in-distribution overfitting in pulmonary nodule diagnosis, this study introduces Joint Embedding Predictive Architecture (JEPA) for the first time into self-supervised pretraining of longitudinal multimodal medical dataβ€”jointly modeling CT scans and electronic health records to learn robust cross-modal joint representations. Methodologically, we integrate JEPA with multimodal contrastive learning and supervised fine-tuning, significantly enhancing generalization under few-shot settings. On an internal validation cohort, our model achieves an AUC of 0.91, outperforming existing unimodal and multimodal baselines. However, performance degradation on external cohorts reveals sensitivity to distributional shift, providing empirical grounding for future domain generalization research. The core contribution lies in establishing the first JEPA-based paradigm for self-supervised learning on longitudinal multimodal clinical archives.

Technology Category

Application Category

πŸ“ Abstract
The development of multimodal models for pulmonary nodule diagnosis is limited by the scarcity of labeled data and the tendency for these models to overfit on the training distribution. In this work, we leverage self-supervised learning from longitudinal and multimodal archives to address these challenges. We curate an unlabeled set of patients with CT scans and linked electronic health records from our home institution to power joint embedding predictive architecture (JEPA) pretraining. After supervised finetuning, we show that our approach outperforms an unregularized multimodal model and imaging-only model in an internal cohort (ours: 0.91, multimodal: 0.88, imaging-only: 0.73 AUC), but underperforms in an external cohort (ours: 0.72, imaging-only: 0.75 AUC). We develop a synthetic environment that characterizes the context in which JEPA may underperform. This work innovates an approach that leverages unlabeled multimodal medical archives to improve predictive models and demonstrates its advantages and limitations in pulmonary nodule diagnosis.
Problem

Research questions and friction points this paper is trying to address.

Self-supervised learning for pulmonary nodule diagnosis using multimodal data
Addressing labeled data scarcity and overfitting in medical imaging models
Evaluating joint embedding architecture performance across internal and external cohorts
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised learning from multimodal archives
Joint embedding predictive architecture pretraining
Synthetic environment for performance characterization
πŸ”Ž Similar Papers
No similar papers found.
T
Thomas Z. Li
Department of Biomedical Engineering , Vanderbilt University, Nashville, TN
A
Aravind R. Krishnan
Department of Electrical and Computer Engineering , Vanderbilt University, Nashville, TN
Lianrui Zuo
Lianrui Zuo
Vanderbilt University
Medical image analysisMRICTImage harmonizationImage synthesis
J
John M. Still
Department of Biomedical Informatics, Vanderbilt University, Nashville, TN
K
Kim L. Sandler
Department of Radiology and Radiological Sciences , Vanderbilt University Medical Center, Nashville, TN
Fabien Maldonado
Fabien Maldonado
Vanderbilt University
Interventional pulmonologylung imaging
T
Thomas A. Lasko
Department of Biomedical Informatics, Vanderbilt University, Nashville, TN
B
Bennett A. Landman
Department of Biomedical Engineering , Vanderbilt University, Nashville, TN