Self-adaptive vision-language model for 3D segmentation of pulmonary artery and vein

πŸ“… 2025-01-07
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the challenges of scarce annotated samples and insufficient cross-modal representation fusion in 3D CT segmentation of pulmonary arteries/veins, this paper proposes a language-guided adaptive cross-attention segmentation framework. Methodologically: (1) CLIP’s pre-trained text-image joint semantic features are leveraged; (2) learnable adapters enable efficient fine-tuning of CLIP on sparsely labeled 3D medical images; (3) an adaptive cross-attention mechanism dynamically fuses multi-modal representations and is embedded into the 3D U-Net decoder. Evaluated on the largest publicly available pulmonary artery/vein CT dataset to date (718 cases), our method significantly outperforms state-of-the-art approaches while reducing annotation requirements by over 60%. The code and dataset will be made publicly available.

Technology Category

Application Category

πŸ“ Abstract
Accurate segmentation of pulmonary structures iscrucial in clinical diagnosis, disease study, and treatment planning. Significant progress has been made in deep learning-based segmentation techniques, but most require much labeled data for training. Consequently, developing precise segmentation methods that demand fewer labeled datasets is paramount in medical image analysis. The emergence of pre-trained vision-language foundation models, such as CLIP, recently opened the door for universal computer vision tasks. Exploiting the generalization ability of these pre-trained foundation models on downstream tasks, such as segmentation, leads to unexpected performance with a relatively small amount of labeled data. However, exploring these models for pulmonary artery-vein segmentation is still limited. This paper proposes a novel framework called Language-guided self-adaptive Cross-Attention Fusion Framework. Our method adopts pre-trained CLIP as a strong feature extractor for generating the segmentation of 3D CT scans, while adaptively aggregating the cross-modality of text and image representations. We propose a s pecially designed adapter module to fine-tune pre-trained CLIP with a self-adaptive learning strategy to effectively fuse the two modalities of embeddings. We extensively validate our method on a local dataset, which is the largest pulmonary artery-vein CT dataset to date and consists of 718 labeled data in total. The experiments show that our method outperformed other state-of-the-art methods by a large margin. Our data and code will be made publicly available upon acceptance.
Problem

Research questions and friction points this paper is trying to address.

3D Image Segmentation
Pulmonary Artery-Vein Separation
Deep Learning Limitations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Language-Guided Adaptive Cross-Attention Fusion
CLIP Model
3D Lung Vasculature Segmentation
πŸ”Ž Similar Papers
No similar papers found.
Xiaotong Guo
Xiaotong Guo
Ph.D. in Transportation, MIT
Transportation ModelingOptimizationShared MobilityPublic Transit
D
Deqian Yang
Key Laboratory of System Software (Chinese Academy of Sciences) and State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, China; School of Intelligent Science and Technology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, China
D
Dan Wang
School of Intelligent Science and Technology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, China
H
Haochen Zhao
School of Computer Science and Engineering, Beihang University, Beijing, China
Y
Yuan Li
Guangzhou Jiayi Software Technology Co., Ltd.
Z
Zhilin Sui
Department of Thoracic Surgery, National Clinical Research Center for Cancer/Cancer Hospital Shenzhen Hospital
T
Tao Zhou
R&D Center, Guangxi Huayi Artificial Intelligence Medical Technology Co., Ltd
L
Lijun Zhang
Key Laboratory of System Software (Chinese Academy of Sciences) and State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, China
Yanda Meng
Yanda Meng
University of Exeter
Medical Image Analysis