🤖 AI Summary
ECG suffers from low spatial resolution, hindering precise localization of cardiac pathology, while CMR—though anatomically informative—is costly and inaccessible. To address this, we propose the first self-supervised pretraining framework integrating multimodal contrastive learning and masked data modeling to enable targeted transfer of CMR anatomical knowledge to ECG temporal representations. By enforcing cross-modal alignment between ECG and CMR, we empirically demonstrate—for the first time—that ECG’s latent space encodes semantically meaningful representations of key CMR-defined anatomical regions. Evaluated on 40,044 subjects from UK Biobank, our method improves CVD risk prediction AUC by 12.19% and cardiac phenotypic localization accuracy by 27.59%. The model is inherently interpretable, offering a novel paradigm for low-cost, high-accuracy, noninvasive cardiac phenotyping.
📝 Abstract
Cardiovascular diseases (CVD) can be diagnosed using various diagnostic modalities. The electrocardiogram (ECG) is a cost-effective and widely available diagnostic aid that provides functional information of the heart. However, its ability to classify and spatially localise CVD is limited. In contrast, cardiac magnetic resonance (CMR) imaging provides detailed structural information of the heart and thus enables evidence-based diagnosis of CVD, but long scan times and high costs limit its use in clinical routine. In this work, we present a deep learning strategy for cost-effective and comprehensive cardiac screening solely from ECG. Our approach combines multimodal contrastive learning with masked data modelling to transfer domain-specific information from CMR imaging to ECG representations. In extensive experiments using data from 40,044 UK Biobank subjects, we demonstrate the utility and generalisability of our method for subject-specific risk prediction of CVD and the prediction of cardiac phenotypes using only ECG data. Specifically, our novel multimodal pre-training paradigm improves performance by up to 12.19 % for risk prediction and 27.59 % for phenotype prediction. In a qualitative analysis, we demonstrate that our learned ECG representations incorporate information from CMR image regions of interest. Our entire pipeline is publicly available at https://github.com/oetu/MMCL-ECG-CMR.