Self-supervised learning of imaging and clinical signatures using a multimodal joint-embedding predictive architecture

📅 2025-09-18

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

To address the dual challenges of scarce annotated data and in-distribution overfitting in pulmonary nodule diagnosis, this study introduces Joint Embedding Predictive Architecture (JEPA) for the first time into self-supervised pretraining of longitudinal multimodal medical data—jointly modeling CT scans and electronic health records to learn robust cross-modal joint representations. Methodologically, we integrate JEPA with multimodal contrastive learning and supervised fine-tuning, significantly enhancing generalization under few-shot settings. On an internal validation cohort, our model achieves an AUC of 0.91, outperforming existing unimodal and multimodal baselines. However, performance degradation on external cohorts reveals sensitivity to distributional shift, providing empirical grounding for future domain generalization research. The core contribution lies in establishing the first JEPA-based paradigm for self-supervised learning on longitudinal multimodal clinical archives.

Technology Category

Application Category

📝 Abstract

The development of multimodal models for pulmonary nodule diagnosis is limited by the scarcity of labeled data and the tendency for these models to overfit on the training distribution. In this work, we leverage self-supervised learning from longitudinal and multimodal archives to address these challenges. We curate an unlabeled set of patients with CT scans and linked electronic health records from our home institution to power joint embedding predictive architecture (JEPA) pretraining. After supervised finetuning, we show that our approach outperforms an unregularized multimodal model and imaging-only model in an internal cohort (ours: 0.91, multimodal: 0.88, imaging-only: 0.73 AUC), but underperforms in an external cohort (ours: 0.72, imaging-only: 0.75 AUC). We develop a synthetic environment that characterizes the context in which JEPA may underperform. This work innovates an approach that leverages unlabeled multimodal medical archives to improve predictive models and demonstrates its advantages and limitations in pulmonary nodule diagnosis.

Problem

Research questions and friction points this paper is trying to address.

Self-supervised learning for pulmonary nodule diagnosis using multimodal data

Addressing labeled data scarcity and overfitting in medical imaging models

Evaluating joint embedding architecture performance across internal and external cohorts

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised learning from multimodal archives

Joint embedding predictive architecture pretraining

Synthetic environment for performance characterization

🔎 Similar Papers

No similar papers found.