🤖 AI Summary
This work addresses the limitation of U-Net’s skip connections in medical image segmentation, which often introduce low-level noise and irrelevant features that degrade semantic accuracy—particularly in low-contrast regions. To mitigate this, the authors reformulate skip connection gating as a decoder-guided sparse feature selection problem and propose an explicit sparse gating mechanism. This mechanism integrates learnable channel-wise thresholds, decoder-driven channel attention, lightweight depthwise separable dilated convolutions, and an ℓ1 proximal operator to simultaneously suppress noise and preserve semantically meaningful features. The proposed method achieves state-of-the-art performance on both 2D and 3D medical image segmentation benchmarks, yielding a notable improvement of approximately 20% in Dice score on challenging 3D tasks.
📝 Abstract
Medical image segmentation commonly relies on U-shaped encoder-decoder architectures such as U-Net, where skip connections preserve fine spatial detail by injecting high-resolution encoder features into the decoder. However, these skip pathways also propagate low-level textures, background clutter, and acquisition noise, allowing irrelevant information to bypass deeper semantic filtering -- an issue that is particularly detrimental in low-contrast clinical imaging. Although attention gates have been introduced to address this limitation, they typically produce dense sigmoid masks that softly reweight features rather than explicitly removing irrelevant activations. We propose ProSMA-UNet (Proximal-Sparse Multi-Scale Attention U-Net), which reformulates skip gating as a decoder-conditioned sparse feature selection problem. ProSMA constructs a multi-scale compatibility field using lightweight depthwise dilated convolutions to capture relevance across local and contextual scales, then enforces explicit sparsity via an $\ell_1$ proximal operator with learnable per-channel thresholds, yielding a closed-form soft-thresholding gate that can remove noisy responses. To further suppress semantically irrelevant channels, ProSMA incorporates decoder-conditioned channel gating driven by global decoder context. Extensive experiments on challenging 2D and 3D benchmarks demonstrate state-of-the-art performance, with particularly large gains ($\approx20$\%) on difficult 3D segmentation tasks. Project page: https://math-ml-x.github.io/ProSMA-UNet/