Multi-modal vision-language model for generalizable annotation-free pathology localization and clinical diagnosis

📅 2024-01-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Medical image pathology localization typically relies on labor-intensive expert annotations and exhibits poor generalization. To address this, we propose AFLoc, a multimodal vision-language model that introduces the first contrastive learning framework based on hierarchical semantic structures, enabling annotation-free pathology localization and clinical diagnosis without image-level labels. Methodologically, AFLoc integrates multi-granularity report–image semantic alignment, joint vision–language embedding, and cross-modal self-supervised representation learning—supporting zero-shot transfer across X-ray, histopathological, and fundus imaging, as well as open-set pathology recognition. Evaluated on six external datasets covering 20 thoracic pathologies, AFLoc achieves state-of-the-art performance. Its cross-modal localization accuracy surpasses human expert baselines. Moreover, AFLoc significantly reduces clinical annotation costs while enhancing robustness and deployability in open-world settings.

Technology Category

Application Category

📝 Abstract
Defining pathologies automatically from medical images aids the understanding of the emergence and progression of diseases, and such an ability is crucial in clinical diagnostics. However, existing deep learning models heavily rely on expert annotations and lack generalization capabilities in open clinical environments. In this study, we present a generalizable vision-language model for Annotation-Free pathology Localization (AFLoc). The core strength of AFLoc lies in its extensive multi-level semantic structure-based contrastive learning, which comprehensively aligns multi-granularity medical concepts from reports with abundant image features, to adapt to the diverse expressions of pathologies and unseen pathologies without the reliance on image annotations from experts. We conducted primary experiments on a dataset of 220K pairs of image-report chest X-ray images, and performed extensive validation across six external datasets encompassing 20 types of chest pathologies. The results demonstrate that AFLoc outperforms state-of-the-art methods in both annotation-free localization and classification tasks. Additionally, we assessed the generalizability of AFLoc on other modalities, including histopathology and retinal fundus images. Extensive experiments show that AFLoc exhibits robust generalization capabilities, even surpassing human benchmarks in localizing five different types of pathological images. These results highlight the potential of AFLoc in reducing annotation requirements and its applicability in complex clinical environments.
Problem

Research questions and friction points this paper is trying to address.

Develops annotation-free pathology localization model for clinical diagnostics
Enhances generalization across diverse pathologies without expert annotations
Validates performance on multiple medical imaging modalities and datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-modal vision-language model for pathology localization
Annotation-free learning with multi-level semantic alignment
Generalizable across diverse pathologies and image modalities
🔎 Similar Papers
No similar papers found.
H
Hao Yang
Paul C. Lauterbur Research Center for Biomedical Imaging, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China; Peng Cheng Laboratory, Shenzhen, China; University of Chinese Academy of Sciences, Beijing, China
Hong-Yu Zhou
Hong-Yu Zhou
Assistant Professor of Biomedical Engineering, Tsinghua University. Past: Harvard Medical School.
AI for HealthcareAI for MedicineBiomedical AI
J
Jiarun Liu
Paul C. Lauterbur Research Center for Biomedical Imaging, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China; Peng Cheng Laboratory, Shenzhen, China; University of Chinese Academy of Sciences, Beijing, China
W
Weijian Huang
Paul C. Lauterbur Research Center for Biomedical Imaging, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China; Peng Cheng Laboratory, Shenzhen, China; University of Chinese Academy of Sciences, Beijing, China
Z
Zhihuan Li
Instutute for Artificial Intelligence in Medicine and Faculty of Medicine, Macau University of Science and Technology, Macau, China
Y
Yuanxu Gao
Instutute for Artificial Intelligence in Medicine and Faculty of Medicine, Macau University of Science and Technology, Macau, China
C
Cheng Li
Paul C. Lauterbur Research Center for Biomedical Imaging, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
Qiegen Liu
Qiegen Liu
Nanchang university
medical imagingimage processing
Y
Yong Liang
Q
Qi Yang
Song Wu
Song Wu
Southwest University
Computer VisionMachine LearningDeep learningMultimedia
Tao Tan
Tao Tan
FCA MPU
Medical Imaging AI
Hairong Zheng
Hairong Zheng
Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences
biomedical imaging
K
Kang Zhang
National Clinical Research Center for Ocular Diseases, Eye Hospital and Advanced Institute for Eye Health and Diseases, Wenzhou Medical University, Wenzhou, China
S
Shanshan Wang
Paul C. Lauterbur Research Center for Biomedical Imaging, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China