🤖 AI Summary
In fine-grained lumbar MRI segmentation, coarse-grained strategies lead to loss of anatomical details, while vision-only models lack semantic understanding, causing missegmentation. To address this, we propose an anatomy-aware multimodal segmentation framework. Our method introduces three novel components: (1) an Anatomy-Aware Text Prompt Generator (ATPG) for clinically grounded textual guidance; (2) a Holistic Anatomy-Semantic Fusion module (HASF) for cross-modal alignment between text semantics and image features; and (3) a Channel-level Contrastive Augmentation module (CCAE) for class-sensitive feature enhancement. By deeply integrating anatomical priors with textual prompts, the framework overcomes limitations of unimodal visual modeling. Evaluated on MRSpineSeg and SPIDER benchmarks, it achieves state-of-the-art performance: on SPIDER, Dice improves to 79.39% (+8.31%), and HD95 reduces to 9.91 pixels (−4.14), demonstrating superior accuracy and robustness in segmenting vertebral bodies, intervertebral discs, and spinal canal substructures.
📝 Abstract
Accurate lumbar spine segmentation is crucial for diagnosing spinal disorders. Existing methods typically use coarse-grained segmentation strategies that lack the fine detail needed for precise diagnosis. Additionally, their reliance on visual-only models hinders the capture of anatomical semantics, leading to misclassified categories and poor segmentation details. To address these limitations, we present ATM-Net, an innovative framework that employs an anatomy-aware, text-guided, multi-modal fusion mechanism for fine-grained segmentation of lumbar substructures, i.e., vertebrae (VBs), intervertebral discs (IDs), and spinal canal (SC). ATM-Net adopts the Anatomy-aware Text Prompt Generator (ATPG) to adaptively convert image annotations into anatomy-aware prompts in different views. These insights are further integrated with image features via the Holistic Anatomy-aware Semantic Fusion (HASF) module, building a comprehensive anatomical context. The Channel-wise Contrastive Anatomy-Aware Enhancement (CCAE) module further enhances class discrimination and refines segmentation through class-wise channel-level multi-modal contrastive learning. Extensive experiments on the MRSpineSeg and SPIDER datasets demonstrate that ATM-Net significantly outperforms state-of-the-art methods, with consistent improvements regarding class discrimination and segmentation details. For example, ATM-Net achieves Dice of 79.39% and HD95 of 9.91 pixels on SPIDER, outperforming the competitive SpineParseNet by 8.31% and 4.14 pixels, respectively.