Revisiting CLIP: Efficient Alignment of 3D MRI and Tabular Data using Domain-Specific Foundation Models

📅 2025-01-23

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

Cross-modal alignment between 3D brain MRI and clinical tabular data remains challenging in medical settings with limited annotated samples. Method: We propose MedCLIP—the first medical-domain-specific 3D CLIP framework—featuring a domain-adapted 3D Vision Transformer (ViT) as the visual encoder, inter-batch negative sample accumulation to stabilize contrastive learning under data scarcity, and an end-to-end shared embedding space for MRI–tabular data. Contribution/Results: Trained on only 62 3D MRI scans, MedCLIP eliminates reliance on large-scale 2D image corpora typical of standard CLIP. Experiments demonstrate state-of-the-art zero-shot classification accuracy and cross-modal retrieval performance, confirming semantic alignment feasibility in low-data regimes. This establishes a scalable, resource-efficient paradigm for multimodal representation learning in data-constrained medical AI applications.

Technology Category

Application Category

📝 Abstract

Multi-modal models require aligned, shared embedding spaces. However, common CLIP-based approaches need large amounts of samples and do not natively support 3D or tabular data, both of which are crucial in the medical domain. To address these issues, we revisit CLIP-style alignment by training a domain-specific 3D foundation model as an image encoder and demonstrate that modality alignment is feasible with only 62 MRI scans. Our approach is enabled by a simple embedding accumulation strategy required for training in 3D, which scales the amount of negative pairs across batches in order to stabilize training. We perform a thorough evaluation of various design choices, including the choice of backbone and loss functions, and evaluate the proposed methodology on zero-shot classification and image-retrieval tasks. While zero-shot image-retrieval remains challenging, zero-shot classification results demonstrate that the proposed approach can meaningfully align the representations of 3D MRI with tabular data.

Problem

Research questions and friction points this paper is trying to address.

Medical Imaging

CLIP Methodology

Data Efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

3D MRI alignment

CLIP redesign

Zero-shot classification

🔎 Similar Papers

RadCLIP: Enhancing Radiologic Image Analysis through Contrastive Language-Image Pre-training