🤖 AI Summary
This study addresses the limited accessibility of high-cost, high-resolution cine cardiac magnetic resonance (CMR) imaging, which is traditionally required for accurate cardiac phenotyping. To overcome this barrier, the authors propose C-TRIP, a novel framework that jointly models rapidly acquired but often discarded localizer MRI scans, electrocardiogram (ECG) signals, and patient metadata through a three-stage multimodal learning architecture—comprising unimodal encoding, cross-modal latent space alignment, and fused representation prediction. Remarkably, C-TRIP enables highly accurate prediction of both structural and functional cardiac phenotypes using only localizer MRI, achieving strong correlations with reference standards. The results demonstrate the feasibility of low-cost, opportunistic cardiac assessment, substantially reducing reliance on comprehensive CMR protocols.
📝 Abstract
Cardiovascular diseases are the leading cause of death. Cardiac phenotypes (CPs), e.g., ejection fraction, are the gold standard for assessing cardiac health, but they are derived from cine cardiac magnetic resonance imaging (CMR), which is costly and requires high spatio-temporal resolution. Every magnetic resonance (MR) examination begins with rapid and coarse localizers for scan planning, which are discarded thereafter. Despite non-diagnostic image quality and lack of temporal information, localizers can provide valuable structural information rapidly. In addition to imaging, patient-level information, including demographics and lifestyle, influence the cardiac health assessment. Electrocardiograms (ECGs) are inexpensive, routinely ordered in clinical practice, and capture the temporal activity of the heart. Here, we introduce C-TRIP (Cardiac Tri-modal Representations for Imaging Phenotypes), a multi-modal framework that aligns localizer MRI, ECG signals, and tabular metadata to learn a robust latent space and predict CPs using localizer images as an opportunistic alternative to CMR. By combining these three modalities, we leverage cheap spatial and temporal information from localizers, and ECG, respectively while benefiting from patient-specific context provided by tabular data. Our pipeline consists of three stages. First, encoders are trained independently to learn uni-modal representations. The second stage fuses the pre-trained encoders to unify the latent space. The final stage uses the enriched representation space for CP prediction, with inference performed exclusively on localizer MRI. Proposed C-TRIP yields accurate functional CPs, and high correlations for structural CPs. Since localizers are inherently rapid and low-cost, our C-TRIP framework could enable better accessibility for CP estimation.