ATM-Net: Anatomy-Aware Text-Guided Multi-Modal Fusion for Fine-Grained Lumbar Spine Segmentation

📅 2025-04-04

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

In fine-grained lumbar MRI segmentation, coarse-grained strategies lead to loss of anatomical details, while vision-only models lack semantic understanding, causing missegmentation. To address this, we propose an anatomy-aware multimodal segmentation framework. Our method introduces three novel components: (1) an Anatomy-Aware Text Prompt Generator (ATPG) for clinically grounded textual guidance; (2) a Holistic Anatomy-Semantic Fusion module (HASF) for cross-modal alignment between text semantics and image features; and (3) a Channel-level Contrastive Augmentation module (CCAE) for class-sensitive feature enhancement. By deeply integrating anatomical priors with textual prompts, the framework overcomes limitations of unimodal visual modeling. Evaluated on MRSpineSeg and SPIDER benchmarks, it achieves state-of-the-art performance: on SPIDER, Dice improves to 79.39% (+8.31%), and HD95 reduces to 9.91 pixels (−4.14), demonstrating superior accuracy and robustness in segmenting vertebral bodies, intervertebral discs, and spinal canal substructures.

Technology Category

Application Category

📝 Abstract

Accurate lumbar spine segmentation is crucial for diagnosing spinal disorders. Existing methods typically use coarse-grained segmentation strategies that lack the fine detail needed for precise diagnosis. Additionally, their reliance on visual-only models hinders the capture of anatomical semantics, leading to misclassified categories and poor segmentation details. To address these limitations, we present ATM-Net, an innovative framework that employs an anatomy-aware, text-guided, multi-modal fusion mechanism for fine-grained segmentation of lumbar substructures, i.e., vertebrae (VBs), intervertebral discs (IDs), and spinal canal (SC). ATM-Net adopts the Anatomy-aware Text Prompt Generator (ATPG) to adaptively convert image annotations into anatomy-aware prompts in different views. These insights are further integrated with image features via the Holistic Anatomy-aware Semantic Fusion (HASF) module, building a comprehensive anatomical context. The Channel-wise Contrastive Anatomy-Aware Enhancement (CCAE) module further enhances class discrimination and refines segmentation through class-wise channel-level multi-modal contrastive learning. Extensive experiments on the MRSpineSeg and SPIDER datasets demonstrate that ATM-Net significantly outperforms state-of-the-art methods, with consistent improvements regarding class discrimination and segmentation details. For example, ATM-Net achieves Dice of 79.39% and HD95 of 9.91 pixels on SPIDER, outperforming the competitive SpineParseNet by 8.31% and 4.14 pixels, respectively.

Problem

Research questions and friction points this paper is trying to address.

Fine-grained lumbar spine segmentation lacks precise detail

Visual-only models miss anatomical semantics causing misclassification

Need multi-modal fusion for accurate substructure segmentation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Anatomy-aware text-guided multi-modal fusion

Holistic Anatomy-aware Semantic Fusion module

Channel-wise Contrastive Anatomy-Aware Enhancement

🔎 Similar Papers

No similar papers found.