🤖 AI Summary
This study addresses the challenge of disentangling anatomical structures from speckle noise and acquisition artifacts in echocardiography by introducing, for the first time in this domain, a latent prediction paradigm. Leveraging 18 million unlabeled images, a large-scale foundation model is developed to learn robust representations of cardiac anatomy through a latent prediction objective. The approach employs a frozen-backbone multi-view probing framework combined with a physics-informed acoustic perturbation evaluation strategy. Remarkably, using only 1% of labeled data, the model achieves 79% accuracy in view classification and improves estimation performance by approximately 20% for left ventricular ejection fraction and 17% for right ventricular systolic pressure. It demonstrates exceptional zero-shot generalization—particularly outperforming fine-tuned baselines on pediatric patients—and exhibits superior robustness to perturbations compared to existing methods.
📝 Abstract
Foundation models for echocardiography often struggle to disentangle anatomical signal from the stochastic speckle and acquisition artifacts inherent to ultrasound. We present EchoJEPA, a foundation model trained on 18 million echocardiograms across 300K patients, representing the largest pretraining corpus for this modality to date. By leveraging a latent predictive objective, EchoJEPA learns robust anatomical representations that ignore speckle noise. We validate this using a novel multi-view probing framework with frozen backbones, where EchoJEPA outperforms leading baselines by approximately 20% in left ventricular ejection fraction (LVEF) estimation and 17% in right ventricular systolic pressure (RVSP) estimation. The model also exhibits remarkable sample efficiency, reaching 79% view classification accuracy with only 1% of labeled data versus 42% for the best baseline trained on 100%. Crucially, EchoJEPA demonstrates superior generalization, degrading by only 2% under physics-informed acoustic perturbations compared to 17% for competitors. Most remarkably, its zero-shot performance on pediatric patients surpasses fully fine-tuned baselines, establishing latent prediction as a superior paradigm for robust, generalizable medical AI.