Text Embedded Swin-UMamba for DeepLesion Segmentation

📅 2025-08-08

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

This study addresses the limited accuracy of automatic lesion segmentation in CT images for chronic diseases (e.g., lymphoma). We propose the first multimodal segmentation framework integrating a large language model (LLM) with the Swin-UMamba architecture. Methodologically, radiology report text—parsed and embedded by the LLM—is fused into the Swin-UMamba backbone to enable cross-modal alignment and joint modeling of textual semantics and visual features. Our key contribution is the first integration of an LLM as a differentiable text encoder into a hybrid architecture combining state-space models (SSMs) and vision transformers, effectively mitigating modality heterogeneity and semantic gap challenges inherent in conventional multimodal segmentation. Evaluated on the DeepLesion dataset, our method achieves a Dice coefficient of 82.0% and a Hausdorff distance of 6.58 pixels—significantly outperforming existing unimodal and multimodal approaches.

Technology Category

Application Category

📝 Abstract

Segmentation of lesions on CT enables automatic measurement for clinical assessment of chronic diseases (e.g., lymphoma). Integrating large language models (LLMs) into the lesion segmentation workflow offers the potential to combine imaging features with descriptions of lesion characteristics from the radiology reports. In this study, we investigate the feasibility of integrating text into the Swin-UMamba architecture for the task of lesion segmentation. The publicly available ULS23 DeepLesion dataset was used along with short-form descriptions of the findings from the reports. On the test dataset, a high Dice Score of 82% and low Hausdorff distance of 6.58 (pixels) was obtained for lesion segmentation. The proposed Text-Swin-UMamba model outperformed prior approaches: 37% improvement over the LLM-driven LanGuideMedSeg model (p < 0.001),and surpassed the purely image-based xLSTM-UNet and nnUNet models by 1.74% and 0.22%, respectively. The dataset and code can be accessed at https://github.com/ruida/LLM-Swin-UMamba

Problem

Research questions and friction points this paper is trying to address.

Integrating text with Swin-UMamba for lesion segmentation

Improving accuracy in CT lesion segmentation using LLMs

Combining imaging features with radiology report descriptions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates text with Swin-UMamba for segmentation

Uses LLMs to combine imaging and text features

Achieves high Dice Score and low Hausdorff distance

🔎 Similar Papers

No similar papers found.