🤖 AI Summary
This work addresses the limitations of existing Mamba-based medical image segmentation methods, which disrupt local two-dimensional spatial structure through pixel-wise directional scanning and employ simplistic summation for multi-directional feature fusion that struggles to adapt to complex lesion morphologies. To overcome these issues, the authors propose a Patch-MoE Mamba architecture that preserves local spatial neighborhoods via hierarchical block-wise ordered scanning while capturing multi-scale contextual information. Furthermore, an adaptive fusion module based on Mixture of Experts (MoE) is introduced, integrating four directional experts, a learnable concatenation expert, and a residual aggregation mechanism to dynamically combine multi-directional features. The proposed method demonstrates significant improvements in segmentation performance and generalization across five polyp segmentation datasets and the ISIC 2017/2018 skin lesion segmentation benchmarks.
📝 Abstract
CNN- and Transformer-based architectures have achieved strong performance in medical image segmentation, but CNNs are limited in modeling long-range dependencies, while Transformers often suffer from quadratic computational and memory complexity. State space models, especially Mamba-based networks, offer an efficient alternative with linear sequence complexity. However, existing Mamba segmentation models still face two limitations: pixel-wise directional scanning can disrupt local 2D spatial structure, and simple summation-based fusion of scan directions cannot adapt well to diverse object sizes, shapes, and boundaries. To address these issues, we propose \textit{Patch-MoE Mamba}, a patch-ordered mixture-of-experts state space architecture for medical image segmentation. It introduces a hierarchical patch-ordered scanning mechanism that preserves local spatial neighborhoods while capturing multi-scale context, and an MoE-based directional fusion module that adaptively combines multiple Mamba scanner outputs using four directional experts, a learnable concatenation expert, and residual directional aggregation. Experiments on five public polyp segmentation benchmarks and the ISIC 2017/2018 skin lesion segmentation datasets demonstrate the effectiveness and generality of Patch-MoE Mamba.