🤖 AI Summary
Clinical echocardiography (ECHO) is resource-intensive, limiting accessibility for cardiac function assessment. This work investigates using low-cost, widely available electrocardiograms (ECGs) as partial substitutes for ECHO to alleviate healthcare system burdens.
Method: We propose an uncertainty-aware cross-modal teacher–student framework that integrates PCME++ probabilistic embedding with the ECHO-CLIP vision–language pretrained model. It performs knowledge distillation from ECHO to ECG via probabilistic contrastive learning, enabling zero-shot and few-shot prediction. Crucially, it aligns ECHO image semantics and ECG time-series signals within a unified probabilistic embedding space and explicitly models prediction uncertainty to identify information-poor ECG segments.
Contribution/Results: To our knowledge, this is the first method to achieve such semantic–temporal alignment across modalities while quantifying predictive uncertainty. On key ECHO metrics—including left ventricular ejection fraction (LVEF) and left ventricular end-systolic volume (LVESV)—it significantly outperforms existing ECG foundation models, validating the efficacy of cross-modal distillation and uncertainty-guided modeling.
📝 Abstract
Electrocardiogram (ECG) is a widely used tool for assessing cardiac function due to its low cost and accessibility. Emergent research shows that ECGs can help make predictions on key outcomes traditionally derived from more complex modalities such as echocardiograms (ECHO), enabling the use of ECGs as a more accessible method to predict broader measurements of cardiac function. ECHO, in particular, are of great importance because they require considerable hospital resources while playing a key role in clinical cardiac assessment. To aid this use case, we introduce EchoingECG, a probabilistic student-teacher model that leverages uncertainty-aware ECG embeddings and ECHO supervision to improve ECG-based cardiac function prediction. Our approach integrates Probabilistic Cross-Modal Embeddings (PCME++), a probabilistic contrastive framework, with ECHO-CLIP, a vision-language pre-trained model trained on ECHO-text pairs, to distill ECHO knowledge into ECG representations. Through experiments and external validation, we showed that EchoingECG outperforms state-of-the-art foundation ECG models in zero-shot, few-shot, and fine-tune settings for ECHO predictions based on ECG. We also highlighted that variance estimation (enabled through our method) enhanced our understanding of model performance by identifying underlying regions of uncertainty within ECGs. The code is available: https://github.com/mcintoshML/EchoingECG.