MambaCAFU: Hybrid Multi-Scale and Multi-Attention Model with Mamba-Based Fusion for Medical Image Segmentation

📅 2025-10-04

📈 Citations: 0

✨ Influential: 0

career value

232K/year

🤖 AI Summary

Medical image segmentation suffers from poor model generalizability and a persistent trade-off between accuracy and computational efficiency. To address these challenges, we propose a novel CNN–Transformer–Mamba tri-branch hybrid architecture. Our approach introduces the first Mamba-based Attention Fusion mechanism, synergistically integrating CNNs’ local receptive fields, Transformers’ global contextual modeling, and Mambas’ efficient long-range sequential representation learning. Furthermore, we design a multi-scale attention decoder coupled with cross-scale collaborative attention gating to jointly enhance fine-grained local detail preservation and long-range dependency modeling. Extensive experiments on multiple public medical imaging benchmarks demonstrate that our method achieves state-of-the-art segmentation accuracy with significantly reduced computational overhead, while markedly improving cross-modality and cross-anatomical generalization. The source code and pre-trained models will be publicly released, facilitating clinical deployment.

Technology Category

Application Category

📝 Abstract

In recent years, deep learning has shown near-expert performance in segmenting complex medical tissues and tumors. However, existing models are often task-specific, with performance varying across modalities and anatomical regions. Balancing model complexity and performance remains challenging, particularly in clinical settings where both accuracy and efficiency are critical. To address these issues, we propose a hybrid segmentation architecture featuring a three-branch encoder that integrates CNNs, Transformers, and a Mamba-based Attention Fusion (MAF) mechanism to capture local, global, and long-range dependencies. A multi-scale attention-based CNN decoder reconstructs fine-grained segmentation maps while preserving contextual consistency. Additionally, a co-attention gate enhances feature selection by emphasizing relevant spatial and semantic information across scales during both encoding and decoding, improving feature interaction and cross-scale communication. Extensive experiments on multiple benchmark datasets show that our approach outperforms state-of-the-art methods in accuracy and generalization, while maintaining comparable computational complexity. By effectively balancing efficiency and effectiveness, our architecture offers a practical and scalable solution for diverse medical imaging tasks. Source code and trained models will be publicly released upon acceptance to support reproducibility and further research.

Problem

Research questions and friction points this paper is trying to address.

Addresses task-specific limitations in medical image segmentation

Balances model complexity with accuracy and efficiency needs

Improves feature interaction across multiple scales and modalities

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid CNN-Transformer-Mamba encoder captures dependencies

Multi-scale attention decoder reconstructs fine-grained maps

Co-attention gate enhances cross-scale feature selection

🔎 Similar Papers

No similar papers found.