USEMA: a Scalable Efficient Mamba Like Attention for Medical Image Segmentation

📅 2026-05-11
📈 Citations: 0
Influential: 0
📄 PDF

career value

218K/year
🤖 AI Summary
This work addresses the high computational complexity of conventional self-attention mechanisms in medical image segmentation and their limited ability to effectively integrate local and global contextual information. To overcome these challenges, the authors propose USEMA, a hybrid UNet architecture that combines convolutional neural networks with a novel SEMA attention mechanism. The approach leverages local window-based attention to mitigate attention dispersion and introduces an efficient global modeling strategy based on arithmetic averaging. Notably, it unifies Mamba-style attention with a synergistic local–global interaction mechanism. Experimental results demonstrate that USEMA consistently outperforms pure CNN, Transformer, and Mamba-based models across diverse imaging modalities and input resolutions, achieving superior segmentation accuracy while maintaining computational efficiency.
📝 Abstract
Accurate medical image segmentation is an integral part of the medical image analysis pipeline that requires the ability to merge local and global information. While vision transformers are able to capture global interactions using vanilla self-attention, their quadratic computational complexity in the input size remains a struggle for medical image segmentation tasks. Motivated by the dispersion property of vanilla self-attention and recent development of Mamba form of attention, Scalable and Efficient Mamba like Attention (SEMA) utilizes token localization via local window attention to avoid dispersion and maintain focusing, complemented by theoretically consistent arithmetic averaging to capture global aspect of attention. In this work, we present USEMA, a hybrid UNet architecture that merges the local feature extraction ability of convolutional neural networks (CNNs) with SEMA attention. We conduct experiments with USEMA across a variety of modalities and image sizes, demonstrating improved computational efficiency compared to transformer based models using full self-attention, and superior segmentation performance relative to purely convolution and Mamba-based models.
Problem

Research questions and friction points this paper is trying to address.

medical image segmentation
self-attention
computational complexity
local-global information fusion
vision transformers
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mamba-like attention
local window attention
medical image segmentation
hybrid UNet
computational efficiency
🔎 Similar Papers
No similar papers found.