🤖 AI Summary
Parotid lesion segmentation faces challenges including highly variable lesion morphology, ill-defined boundaries, and difficulty in obtaining precise prompts; moreover, existing methods inadequately incorporate clinical expert knowledge. To address these issues, we propose Text-SAM, the first text-guided adaptation of the Segment Anything Model (SAM) for medical imaging. Text-SAM converts unstructured clinical diagnostic reports into structured textual prompts to inject domain-specific prior knowledge. We design a cross-sequence attention mechanism to jointly model multimodal imaging data (e.g., T1-, T2-, and contrast-enhanced sequences) and textual prompts. Segmentation is performed end-to-end using SAM’s decoder. Evaluated on independent datasets from three clinical centers, Text-SAM significantly outperforms state-of-the-art methods (p < 0.01), achieving Dice score improvements of 3.2–5.8%. These results demonstrate the efficacy and generalizability of text-guided segmentation across multi-center, multi-sequence medical image analysis.
📝 Abstract
Parotid gland lesion segmentation is essential for the treatment of parotid gland diseases. However, due to the variable size and complex lesion boundaries, accurate parotid gland lesion segmentation remains challenging. Recently, the Segment Anything Model (SAM) fine-tuning has shown remarkable performance in the field of medical image segmentation. Nevertheless, SAM's interaction segmentation model relies heavily on precise lesion prompts (points, boxes, masks, etc.), which are very difficult to obtain in real-world applications. Besides, current medical image segmentation methods are automatically generated, ignoring the domain knowledge of medical experts when performing segmentation. To address these limitations, we propose the parotid gland segment anything model (PG-SAM), an expert diagnosis text-guided SAM incorporating expert domain knowledge for cross-sequence parotid gland lesion segmentation. Specifically, we first propose an expert diagnosis report guided prompt generation module that can automatically generate prompt information containing the prior domain knowledge to guide the subsequent lesion segmentation process. Then, we introduce a cross-sequence attention module, which integrates the complementary information of different modalities to enhance the segmentation effect. Finally, the multi-sequence image features and generated prompts are feed into the decoder to get segmentation result. Experimental results demonstrate that PG-SAM achieves state-of-the-art performance in parotid gland lesion segmentation across three independent clinical centers, validating its clinical applicability and the effectiveness of diagnostic text for enhancing image segmentation in real-world clinical settings.