MeDSLIP: Medical Dual-Stream Language-Image Pre-training with Pathology-Anatomy Semantic Alignment

📅 2024-03-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Medical vision-language models (VLMs) often struggle to model the relationship between pathology (disease nature) and anatomy (lesion location) due to severe semantic entanglement between these two dimensions. To address this, we propose a pathology-anatomy dual-semantic disentangled two-stream pretraining framework: two separate Transformer streams independently encode disease categories and anatomical locations, while prototype-based contrastive learning and intra-image contrastive loss jointly model their structured interdependencies—enabling fine-grained cross-modal semantic alignment and interactive reasoning. This work introduces the first explicit semantic disentanglement mechanism for medical VLMs. Evaluated on four major chest X-ray benchmarks—NIH CXR14, RSNA Pneumonia, CheXpert, and MIMIC-CXR—the method achieves significant improvements in zero-shot transfer and downstream task performance, demonstrating strong generalization and scalability.

Technology Category

Application Category

📝 Abstract
Pathology and anatomy are two essential groups of semantics in medical data. Pathology describes what the diseases are, while anatomy explains where the diseases occur. They describe diseases from different perspectives, providing complementary insights into diseases. Thus, properly understanding these semantics and their relationships can enhance medical vision-language models (VLMs). However, pathology and anatomy semantics are usually entangled in medical data, hindering VLMs from explicitly modeling these semantics and their relationships. To address this challenge, we propose MeDSLIP, a novel Medical Dual-Stream Language-Image Pre-training pipeline, to disentangle pathology and anatomy semantics and model the relationships between them. We introduce a dual-stream mechanism in MeDSLIP to explicitly disentangle medical semantics into pathology-relevant and anatomy-relevant streams and align visual and textual information within each stream. Furthermore, we propose an interaction modeling module with prototypical contrastive learning loss and intra-image contrastive learning loss to regularize the relationships between pathology and anatomy semantics. We apply MeDSLIP to chest X-ray analysis and conduct comprehensive evaluations with four benchmark datasets: NIH CXR14, RSNA Pneumonia, SIIM-ACR Pneumothorax, and COVIDx CXR-4. The results demonstrate MeDSLIP's superior generalizability and transferability across different scenarios. The code is available at https://github.com/Shef-AIRE/MeDSLIP, and the pre-trained model is released at https://huggingface.co/pykale/MeDSLIP.
Problem

Research questions and friction points this paper is trying to address.

Disentangling pathology and anatomy semantics in medical data
Modeling relationships between pathology and anatomy semantics
Enhancing medical vision-language models through semantic alignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-stream mechanism disentangles pathology and anatomy semantics
Prototypical contrastive learning aligns visual and textual information
Intra-image contrastive learning regularizes pathology-anatomy relationships
🔎 Similar Papers
No similar papers found.
Wenrui Fan
Wenrui Fan
AI Research Engineer, The University of Sheffield
Multi-modal AISelf-supervised learningComputer Vision
M
M. N. Suvon
Department of Computer Science, University of Sheffield; Centre of Machine Intelligence, University of Sheffield
S
Shuo Zhou
Department of Computer Science, University of Sheffield; Centre of Machine Intelligence, University of Sheffield
Xianyuan Liu
Xianyuan Liu
University of Sheffield
Deep LearningMaterials DesignMachine Learning
S
S. Alabed
Department of Infection, Immunity and Cardiovascular Disease, University of Sheffield; Department of Clinical Radiology, Sheffield Teaching Hospitals; INSIGNEO, Institute for in Silico Medicine, University of Sheffield
V
V. Osmani
Information School, University of Sheffield
A
Andrew J Swift
Department of Infection, Immunity and Cardiovascular Disease, University of Sheffield; Department of Clinical Radiology, Sheffield Teaching Hospitals; INSIGNEO, Institute for in Silico Medicine, University of Sheffield
C
Chen Chen
Department of Computer Science, University of Sheffield; Department of Engineering Science, University of Oxford; Department of Computing, Imperial College London
Haiping Lu
Haiping Lu
Professor of Machine Learning, University of Sheffield
Machine learningMultimodal AIAI4HealthAI4ScienceOpen-source software