🤖 AI Summary
To address the challenges of modeling long-range dependencies, high computational overhead, and insufficient multi-scale semantic alignment in medical image segmentation, this paper proposes a novel U-shaped network architecture. Methodologically, it introduces two key components: (1) the Self-Adaptive Mamba-like Aggregation Attention (SAMA) module, which integrates state-space modeling with dynamic weighting to enable context-aware and efficient feature modulation; and (2) the Causal Resonance Multi-Scale Module (CR-MSM), which overcomes the autoregressive constraint of conventional State Space Models (SSMs) by supporting 2D image token structures and enabling cross-scale causal alignment. The design jointly captures fine-grained local details and global semantics. Extensive experiments on MRI, CT, and endoscopic datasets demonstrate consistent superiority over CNN-, Transformer-, and Mamba-based baselines, achieving average Dice score improvements of 2.1–4.7%. The source code is publicly available.
📝 Abstract
Medical image segmentation plays an important role in various clinical applications, but existing models often struggle with the computational inefficiencies and challenges posed by complex medical data. State Space Sequence Models (SSMs) have demonstrated promise in modeling long-range dependencies with linear computational complexity, yet their application in medical image segmentation remains hindered by incompatibilities with image tokens and autoregressive assumptions. Moreover, it is difficult to achieve a balance in capturing both local fine-grained information and global semantic dependencies. To address these challenges, we introduce SAMA-UNet, a novel architecture for medical image segmentation. A key innovation is the Self-Adaptive Mamba-like Aggregated Attention (SAMA) block, which integrates contextual self-attention with dynamic weight modulation to prioritise the most relevant features based on local and global contexts. This approach reduces computational complexity and improves the representation of complex image features across multiple scales. We also suggest the Causal-Resonance Multi-Scale Module (CR-MSM), which enhances the flow of information between the encoder and decoder by using causal resonance learning. This mechanism allows the model to automatically adjust feature resolution and causal dependencies across scales, leading to better semantic alignment between the low-level and high-level features in U-shaped architectures. Experiments on MRI, CT, and endoscopy images show that SAMA-UNet performs better in segmentation accuracy than current methods using CNN, Transformer, and Mamba. The implementation is publicly available at GitHub.