ProbMed: A Probabilistic Framework for Medical Multimodal Binding

πŸ“… 2025-09-29
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing medical vision-language pretraining models struggle to capture complex one-to-many and many-to-many semantic associations between medical images and clinical text. To address this, we propose the first probabilistic contrastive learning framework for medical multimodal pretraining, wherein embeddings are modeled as Gaussian distributions parameterized by mean and variance. Our method introduces an improved InfoNCE loss based on the Hellinger distance and a probabilistic compositional sampling strategy, enabling unified alignment of X-ray, electrocardiogram, echocardiogram, and clinical text modalities within a shared probabilistic embedding space. Evaluated on 13 benchmark datasets, our approach achieves significant improvements in cross-modal retrieval, zero-shot and few-shot classification, and prognostic prediction. Results demonstrate that probabilistic embedding representations enhance both the effectiveness and robustness of multimodal协同 analysis in clinical settings.

Technology Category

Application Category

πŸ“ Abstract
Medical decision-making requires integrating diverse medical information, from imaging to clinical narratives. These medical modalities are often acquired in a many-to-many manner. However, current medical vision-language pretraining models (Med-VLPMs) fail to directly account for this many-to-many mapping in their model training and embeddings. To address this, we present Probabilistic Modality-Enhanced Diagnosis (ProbMED), a multimodal Med-VLPM that employs probabilistic contrastive learning to model distributions over embeddings rather than deterministic estimates. ProbMED aligns four distinct modalities--chest X-rays, electrocardiograms, echocardiograms, and clinical text--into a unified probabilistic embedding space. We use InfoNCE loss with Hellinger distance to integrate inter-modality distributions. We introduce a probabilistic synthetic sampling loss that captures modality-specific mean and variance to improve intra-modality binding. Extensive experiments across 13 medical datasets demonstrate that our model outperforms current Med-VLPMs in cross-modality retrieval, zero-shot, and few-shot classification. We also demonstrate the robust integration of multiple modalities for prognostication, showing improved intra- and inter-medical modality binding.
Problem

Research questions and friction points this paper is trying to address.

Modeling many-to-many medical modality mappings in embeddings
Aligning diverse medical data into unified probabilistic space
Improving cross-modality retrieval and classification performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Probabilistic contrastive learning models embedding distributions
Hellinger distance integrates inter-modality distributions
Synthetic sampling loss captures intra-modality variance
πŸ”Ž Similar Papers
No similar papers found.
Y
Yuan Gao
Peter Munk Cardiac Centre, Ted Rogers Centre for Heart Research, University Health Network, University of Toronto, Vector Institute
S
Sangwook Kim
Joint Department of Medical Imaging, University Health Network, University of Toronto, Vector Institute
J
Jianzhong You
Peter Munk Cardiac Centre, Ted Rogers Centre for Heart Research, University Health Network, University of Toronto, Vector Institute
Chris McIntosh
Chris McIntosh
Scientist, UHN; Assistant Professor, U of Toronto
Medical Image AnalysisMachine LearningOptimization