🤖 AI Summary
State-space models (SSMs) suffer from spatial discontinuity due to sequence flattening and poor adaptability to high-variance medical images in 3D medical image segmentation. To address these limitations, we propose MM-UNet—a novel architecture integrating a U-shaped encoder-decoder with hybrid residual SSM modules. Our method introduces a bidirectional spatial scanning mechanism that preserves 3D structural integrity while enhancing long-range dependency modeling; it further incorporates variance-aware normalization to improve robustness against intensity heterogeneity inherent in medical imaging. Evaluated on AMOS2022 and Synapse benchmarks, MM-UNet achieves Dice scores of 91.0% and 87.1%, respectively—surpassing nnUNet by 3.2%—thereby establishing the first empirically validated, efficient adaptation pathway for SSMs in 3D medical segmentation.
📝 Abstract
State Space Models (SSMs) have recently demonstrated outstanding performance in long-sequence modeling, particularly in natural language processing. However, their direct application to medical image segmentation poses several challenges. SSMs, originally designed for 1D sequences, struggle with 3D spatial structures in medical images due to discontinuities introduced by flattening. Additionally, SSMs have difficulty fitting high-variance data, which is common in medical imaging. In this paper, we analyze the intrinsic limitations of SSMs in medical image segmentation and propose a unified U-shaped encoder-decoder architecture, Meta Mamba UNet (MM-UNet), designed to leverage the advantages of SSMs while mitigating their drawbacks. MM-UNet incorporates hybrid modules that integrate SSMs within residual connections, reducing variance and improving performance. Furthermore, we introduce a novel bi-directional scan order strategy to alleviate discontinuities when processing medical images. Extensive experiments on the AMOS2022 and Synapse datasets demonstrate the superiority of MM-UNet over state-of-the-art methods. MM-UNet achieves a Dice score of 91.0% on AMOS2022, surpassing nnUNet by 3.2%, and a Dice score of 87.1% on Synapse. These results confirm the effectiveness of integrating SSMs in medical image segmentation through architectural design optimizations.