TK-Mamba: Marrying KAN with Mamba for Text-Driven 3D Medical Image Segmentation

📅 2025-05-24

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

Addressing the dual challenges of modeling long-range dependencies and low computational efficiency in 3D medical image segmentation, this paper proposes the first multimodal framework integrating Mamba-based sequential modeling with the Kolmogorov–Arnold Network (KAN). Our key contributions are: (1) a novel 3D Grouped Rational KAN (3D-GR-KAN) module—the first application of grouped rational KANs to 3D volumetric data—achieving superior expressivity with parameter efficiency; (2) an Enhanced Gated Spatial Convolution (EGSC) operator that strengthens local-global spatial awareness; and (3) a dual-path CLIP-guided text-driven mechanism enabling semantic consistency and lesion-level fine-grained alignment. Evaluated on the MSD and KiTS23 benchmarks, our method achieves state-of-the-art performance—improving Dice score by +3.2% and inference speed by ×2.1 (FPS)—while supporting lightweight clinical deployment. The code is publicly available.

Technology Category

Application Category

📝 Abstract

3D medical image segmentation is vital for clinical diagnosis and treatment but is challenged by high-dimensional data and complex spatial dependencies. Traditional single-modality networks, such as CNNs and Transformers, are often limited by computational inefficiency and constrained contextual modeling in 3D settings. We introduce a novel multimodal framework that leverages Mamba and Kolmogorov-Arnold Networks (KAN) as an efficient backbone for long-sequence modeling. Our approach features three key innovations: First, an EGSC (Enhanced Gated Spatial Convolution) module captures spatial information when unfolding 3D images into 1D sequences. Second, we extend Group-Rational KAN (GR-KAN), a Kolmogorov-Arnold Networks variant with rational basis functions, into 3D-Group-Rational KAN (3D-GR-KAN) for 3D medical imaging - its first application in this domain - enabling superior feature representation tailored to volumetric data. Third, a dual-branch text-driven strategy leverages CLIP's text embeddings: one branch swaps one-hot labels for semantic vectors to preserve inter-organ semantic relationships, while the other aligns images with detailed organ descriptions to enhance semantic alignment. Experiments on the Medical Segmentation Decathlon (MSD) and KiTS23 datasets show our method achieving state-of-the-art performance, surpassing existing approaches in accuracy and efficiency. This work highlights the power of combining advanced sequence modeling, extended network architectures, and vision-language synergy to push forward 3D medical image segmentation, delivering a scalable solution for clinical use. The source code is openly available at https://github.com/yhy-whu/TK-Mamba.

Problem

Research questions and friction points this paper is trying to address.

Addresses high-dimensional 3D medical image segmentation challenges

Overcomes computational inefficiency in CNNs and Transformers

Enhances semantic alignment via text-driven dual-branch strategy

Innovation

Methods, ideas, or system contributions that make the work stand out.

EGSC module captures spatial information efficiently

3D-GR-KAN enhances feature representation for volumetric data

Dual-branch text-driven strategy improves semantic alignment

🔎 Similar Papers

One Model to Rule them All: Towards Universal Segmentation for Medical Images with Text Prompts