🤖 AI Summary
To address the inherent trade-off in multimodal medical image super-resolution—where CNNs suffer from fixed local receptive fields while Transformers incur prohibitive computational costs for global modeling—this paper proposes a dual-branch Mamba architecture. The global branch leverages State Space Models (SSMs) for efficient long-range dependency modeling, while the local branch employs deformable convolution and a feature modulator to adaptively capture short-range structural details. We introduce the first global–local decoupled design, augmented by a multimodal feature fusion block and a novel Contrastive Edge Loss (CELoss), significantly enhancing edge-texture recovery and cross-modal complementary modeling. Evaluated on MRI/PET datasets, our method achieves state-of-the-art performance, with substantial improvements in PSNR and SSIM, markedly better structural fidelity and edge sharpness, while maintaining linear inference complexity.
📝 Abstract
Convolutional neural networks and Transformer have made significant progresses in multi-modality medical image super-resolution. However, these methods either have a fixed receptive field for local learning or significant computational burdens for global learning, limiting the super-resolution performance. To solve this problem, State Space Models, notably Mamba, is introduced to efficiently model long-range dependencies in images with linear computational complexity. Relying on the Mamba and the fact that low-resolution images rely on global information to compensate for missing details, while high-resolution reference images need to provide more local details for accurate super-resolution, we propose a global and local Mamba network (GLMamba) for multi-modality medical image super-resolution. To be specific, our GLMamba is a two-branch network equipped with a global Mamba branch and a local Mamba branch. The global Mamba branch captures long-range relationships in low-resolution inputs, and the local Mamba branch focuses more on short-range details in high-resolution reference images. We also use the deform block to adaptively extract features of both branches to enhance the representation ability. A modulator is designed to further enhance deformable features in both global and local Mamba blocks. To fully integrate the reference image for low-resolution image super-resolution, we further develop a multi-modality feature fusion block to adaptively fuse features by considering similarities, differences, and complementary aspects between modalities. In addition, a contrastive edge loss (CELoss) is developed for sufficient enhancement of edge textures and contrast in medical images.