🤖 AI Summary
Infrared small target detection (ISTD) suffers from severe challenges including extremely low target occupancy (<0.15%), low signal-clutter ratio, and complex backgrounds, leading to high false-negative and false-positive rates. To address these issues, we propose the first ISTD framework integrating SAM2’s hierarchical visual representations with Mamba’s selective state-space modeling. Our key innovations include: (1) an FS-Adapter module for domain-adaptive feature alignment to mitigate cross-scene generalization bias; (2) a CSI module for efficient global contextual modeling via long-range dependency capture; and (3) a DPCF module for detail-preserving feature fusion to suppress information loss during downsampling. The framework further incorporates learnable task embeddings, channel-adaptive transformations, and gated multi-scale fusion. Extensive experiments on NUAA-SIRST, IRSTD-1k, and NUDT-SIRST demonstrate consistent superiority over state-of-the-art methods, particularly under heterogeneous backgrounds and multi-scale target scenarios, achieving significant gains in both detection accuracy and robustness.
📝 Abstract
Infrared small target detection (ISTD) is vital for long-range surveillance in military, maritime, and early warning applications. ISTD is challenged by targets occupying less than 0.15% of the image and low distinguishability from complex backgrounds. Existing deep learning methods often suffer from information loss during downsampling and inefficient global context modeling. This paper presents SAMamba, a novel framework integrating SAM2's hierarchical feature learning with Mamba's selective sequence modeling. Key innovations include: (1) A Feature Selection Adapter (FS-Adapter) for efficient natural-to-infrared domain adaptation via dual-stage selection (token-level with a learnable task embedding and channel-wise adaptive transformations); (2) A Cross-Channel State-Space Interaction (CSI) module for efficient global context modeling with linear complexity using selective state space modeling; and (3) A Detail-Preserving Contextual Fusion (DPCF) module that adaptively combines multi-scale features with a gating mechanism to balance high-resolution and low-resolution feature contributions. SAMamba addresses core ISTD challenges by bridging the domain gap, maintaining fine-grained details, and efficiently modeling long-range dependencies. Experiments on NUAA-SIRST, IRSTD-1k, and NUDT-SIRST datasets show SAMamba significantly outperforms state-of-the-art methods, especially in challenging scenarios with heterogeneous backgrounds and varying target scales. Code: https://github.com/zhengshuchen/SAMamba.