TGV: Tabular Data-Guided Learning of Visual Cardiac Representations

📅 2025-03-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of medical image representation learning under clinical annotation scarcity. We propose a tabular-data-guided contrastive learning framework that leverages structured clinical data—such as demographics and physiological metrics—to inform patient-level positive/negative sample pairing for cardiac short-axis MRI, without requiring joint embedding or additional annotations. Our key contribution is the first use of tabular data as an external semantic signal to implicitly guide a unimodal image encoder toward learning clinically interpretable representations. Integrating multi-source clinical data from UK Biobank with MRI-specific augmentation strategies, our method significantly outperforms both pure image-augmentation and joint-embedding baselines on cardiovascular disease classification and cardiac phenotyping tasks. Notably, the image encoder spontaneously captures critical clinical attributes—including age and sex—enhancing generalization and zero-shot inference capability in real-world clinical settings.

Technology Category

Application Category

📝 Abstract
Contrastive learning methods in computer vision typically rely on different views of the same image to form pairs. However, in medical imaging, we often seek to compare entire patients with different phenotypes rather than just multiple augmentations of one scan. We propose harnessing clinically relevant tabular data to identify distinct patient phenotypes and form more meaningful pairs in a contrastive learning framework. Our method uses tabular attributes to guide the training of visual representations, without requiring a joint embedding space. We demonstrate its strength using short-axis cardiac MR images and clinical attributes from the UK Biobank, where tabular data helps to more effectively distinguish between patient subgroups. Evaluation on downstream tasks, including fine-tuning and zero-shot prediction of cardiovascular artery diseases and cardiac phenotypes, shows that incorporating tabular data yields stronger visual representations than conventional methods that rely solely on image augmentations or combined image-tabular embeddings. Furthermore, we demonstrate that image encoders trained with tabular guidance are capable of embedding demographic information in their representations, allowing them to use insights from tabular data for unimodal predictions, making them well-suited to real-world medical settings where extensive clinical annotations may not be routinely available at inference time. The code will be available on GitHub.
Problem

Research questions and friction points this paper is trying to address.

Leveraging tabular data to enhance visual cardiac representation learning.
Improving patient phenotype distinction using clinical attributes in contrastive learning.
Enabling unimodal predictions by embedding demographic information in image encoders.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses tabular data to guide visual representation learning
Avoids joint embedding space for image and tabular data
Enhances cardiac phenotype differentiation using clinical attributes
🔎 Similar Papers
No similar papers found.
Marta Hasny
Marta Hasny
Helmholtz Munich / TUM
M
Maxime Di Folco
School of Computation, Information and Technology, Technical University of Munich, Germany; Institute of Machine Learning for Biomedical Imaging, Helmholtz Munich, Germany
Keno Bressem
Keno Bressem
Technical University Munich
deep learningradiomicsmicrowave ablation
J
Julia Schnabel
School of Computation, Information and Technology, Technical University of Munich, Germany; Institute of Machine Learning for Biomedical Imaging, Helmholtz Munich, Germany; School of Biomedical Engineering & Imaging Sciences, King’s College London, UK