MambaCAFU: Hybrid Multi-Scale and Multi-Attention Model with Mamba-Based Fusion for Medical Image Segmentation

📅 2025-10-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Medical image segmentation suffers from poor model generalizability and a persistent trade-off between accuracy and computational efficiency. To address these challenges, we propose a novel CNN–Transformer–Mamba tri-branch hybrid architecture. Our approach introduces the first Mamba-based Attention Fusion mechanism, synergistically integrating CNNs’ local receptive fields, Transformers’ global contextual modeling, and Mambas’ efficient long-range sequential representation learning. Furthermore, we design a multi-scale attention decoder coupled with cross-scale collaborative attention gating to jointly enhance fine-grained local detail preservation and long-range dependency modeling. Extensive experiments on multiple public medical imaging benchmarks demonstrate that our method achieves state-of-the-art segmentation accuracy with significantly reduced computational overhead, while markedly improving cross-modality and cross-anatomical generalization. The source code and pre-trained models will be publicly released, facilitating clinical deployment.

Technology Category

Application Category

📝 Abstract
In recent years, deep learning has shown near-expert performance in segmenting complex medical tissues and tumors. However, existing models are often task-specific, with performance varying across modalities and anatomical regions. Balancing model complexity and performance remains challenging, particularly in clinical settings where both accuracy and efficiency are critical. To address these issues, we propose a hybrid segmentation architecture featuring a three-branch encoder that integrates CNNs, Transformers, and a Mamba-based Attention Fusion (MAF) mechanism to capture local, global, and long-range dependencies. A multi-scale attention-based CNN decoder reconstructs fine-grained segmentation maps while preserving contextual consistency. Additionally, a co-attention gate enhances feature selection by emphasizing relevant spatial and semantic information across scales during both encoding and decoding, improving feature interaction and cross-scale communication. Extensive experiments on multiple benchmark datasets show that our approach outperforms state-of-the-art methods in accuracy and generalization, while maintaining comparable computational complexity. By effectively balancing efficiency and effectiveness, our architecture offers a practical and scalable solution for diverse medical imaging tasks. Source code and trained models will be publicly released upon acceptance to support reproducibility and further research.
Problem

Research questions and friction points this paper is trying to address.

Addresses task-specific limitations in medical image segmentation
Balances model complexity with accuracy and efficiency needs
Improves feature interaction across multiple scales and modalities
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid CNN-Transformer-Mamba encoder captures dependencies
Multi-scale attention decoder reconstructs fine-grained maps
Co-attention gate enhances cross-scale feature selection
🔎 Similar Papers
No similar papers found.
T
T-Mai Bui
University of the Basque Country UPV/EHU, San Sebastian, Spain
Fares Bougourzi
Fares Bougourzi
JUNIA Grande école d'ingénieurs
Electronicscomputer science
Fadi Dornaika
Fadi Dornaika
IKERBASQUE Research Foundation
computer visionpattern recognitionmachine learning
V
Vinh Truong Hoang
Ho Chi Minh City Open University, VietNam