🤖 AI Summary
Detecting small underwater sonar targets remains challenging due to sparse pixel representation, low acoustic contrast, and scale ambiguity, with existing methods exhibiting limitations in noise suppression and multi-scale semantic alignment. To address these issues, this work proposes MambaDSF, a hybrid framework that innovatively integrates the Mamba state space model (SSM) architecture with a feature pyramid network. The approach introduces a Mamba-enhanced feature pyramid and a dilated fusion Mamba encoder, complemented by a scale-adaptive weighted IoU (SA-WIoU) loss and a cross-scale consistency (CSC) loss. Evaluated on the UATD forward-looking sonar dataset, the method achieves 91.5% mAP50 with 28.7 million parameters, outperforming the current state of the art by 2.2 percentage points on the small-target subset and demonstrating strong cross-domain generalization capability.
📝 Abstract
Sonar imaging is the primary modality for underwater target detection, yet small targets remain difficult to detect due to insufficient pixel coverage, low acoustic contrast, and scale ambiguity across imaging ranges. CNN-based detectors extract local features efficiently but cannot suppress noise-induced false alarms without global acoustic context. Transformer-based methods capture long-range dependencies at quadratic computational cost. Existing Mamba-based vision models offer efficient linear-cost scanning but lack multi-scale semantic alignment across pyramid levels, multi-receptive-field fusion, and small-target-aware training supervision needed for reliable sonar detection.
This letter proposes Mamba Dilated-Scale Fusion (MambaDSF), a hybrid framework addressing these limitations through three contributions: a Mamba Enhanced Feature Pyramid (MambaEFP) backbone that jointly captures local echo cues and global acoustic context at linear complexity; a Dilate Fusion Mamba (DFMamba) encoder that enforces multi-scale feature alignment across pyramid levels; and Scale-Adaptive Weighted IoU (SA-WIoU) and Cross-Scale Coherence (CSC) losses that stabilize small-target training. MambaDSF achieves 91.5% mAP50 on the UATD forward-looking sonar benchmark with 28.7 million parameters, surpassing all compared detectors. On a small-target subset the gain reached +2.2 percentage points, and cross-domain evaluation on FLS and MD-FLS confirms the generalization of the proposed architecture. The codes are publicly available at https://github.com/IDontKnowAAA/MambaDSF.