๐ค AI Summary
Medical imaging anomaly detection faces dual challenges: limited receptive fields in CNNs and high computational overhead in Transformers. To address these, we propose SpectMambaโthe first Mamba-based architecture tailored for medical image anomaly detection, which innovatively integrates frequency-domain analysis with state-space modeling. Methodologically, we design a hybrid spatial-frequency attention module, leveraging Hilbert curve scanning to preserve high-frequency details while capturing long-range dependencies; introduce a visual state-space module, learnable frequency-domain transformations, and linear-complexity sequence modeling to enable adaptive spatial-frequency feature fusion. Evaluated on multi-modal, multi-disease anomaly detection benchmarks, SpectMamba achieves state-of-the-art performance at significantly lower computational cost, demonstrating superior accuracy, efficiency, and generalization across diverse medical imaging tasks.
๐ Abstract
Abnormality detection in medical imaging is a critical task requiring both high efficiency and accuracy to support effective diagnosis. While convolutional neural networks (CNNs) and Transformer-based models are widely used, both face intrinsic challenges: CNNs have limited receptive fields, restricting their ability to capture broad contextual information, and Transformers encounter prohibitive computational costs when processing high-resolution medical images. Mamba, a recent innovation in natural language processing, has gained attention for its ability to process long sequences with linear complexity, offering a promising alternative. Building on this foundation, we present SpectMamba, the first Mamba-based architecture designed for medical image detection. A key component of SpectMamba is the Hybrid Spatial-Frequency Attention (HSFA) block, which separately learns high- and low-frequency features. This approach effectively mitigates the loss of high-frequency information caused by frequency bias and correlates frequency-domain features with spatial features, thereby enhancing the model's ability to capture global context. To further improve long-range dependencies, we propose the Visual State-Space Module (VSSM) and introduce a novel Hilbert Curve Scanning technique to strengthen spatial correlations and local dependencies, further optimizing the Mamba framework. Comprehensive experiments show that SpectMamba achieves state-of-the-art performance while being both effective and efficient across various medical image detection tasks.