SAMamba: Adaptive State Space Modeling with Hierarchical Vision for Infrared Small Target Detection

📅 2025-05-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Infrared small target detection (ISTD) suffers from severe challenges including extremely low target occupancy (<0.15%), low signal-clutter ratio, and complex backgrounds, leading to high false-negative and false-positive rates. To address these issues, we propose the first ISTD framework integrating SAM2’s hierarchical visual representations with Mamba’s selective state-space modeling. Our key innovations include: (1) an FS-Adapter module for domain-adaptive feature alignment to mitigate cross-scene generalization bias; (2) a CSI module for efficient global contextual modeling via long-range dependency capture; and (3) a DPCF module for detail-preserving feature fusion to suppress information loss during downsampling. The framework further incorporates learnable task embeddings, channel-adaptive transformations, and gated multi-scale fusion. Extensive experiments on NUAA-SIRST, IRSTD-1k, and NUDT-SIRST demonstrate consistent superiority over state-of-the-art methods, particularly under heterogeneous backgrounds and multi-scale target scenarios, achieving significant gains in both detection accuracy and robustness.

Technology Category

Application Category

📝 Abstract
Infrared small target detection (ISTD) is vital for long-range surveillance in military, maritime, and early warning applications. ISTD is challenged by targets occupying less than 0.15% of the image and low distinguishability from complex backgrounds. Existing deep learning methods often suffer from information loss during downsampling and inefficient global context modeling. This paper presents SAMamba, a novel framework integrating SAM2's hierarchical feature learning with Mamba's selective sequence modeling. Key innovations include: (1) A Feature Selection Adapter (FS-Adapter) for efficient natural-to-infrared domain adaptation via dual-stage selection (token-level with a learnable task embedding and channel-wise adaptive transformations); (2) A Cross-Channel State-Space Interaction (CSI) module for efficient global context modeling with linear complexity using selective state space modeling; and (3) A Detail-Preserving Contextual Fusion (DPCF) module that adaptively combines multi-scale features with a gating mechanism to balance high-resolution and low-resolution feature contributions. SAMamba addresses core ISTD challenges by bridging the domain gap, maintaining fine-grained details, and efficiently modeling long-range dependencies. Experiments on NUAA-SIRST, IRSTD-1k, and NUDT-SIRST datasets show SAMamba significantly outperforms state-of-the-art methods, especially in challenging scenarios with heterogeneous backgrounds and varying target scales. Code: https://github.com/zhengshuchen/SAMamba.
Problem

Research questions and friction points this paper is trying to address.

Detects infrared small targets in complex backgrounds
Reduces information loss during image downsampling
Improves global context modeling efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Feature Selection Adapter for domain adaptation
Cross-Channel State-Space Interaction module
Detail-Preserving Contextual Fusion module
🔎 Similar Papers
No similar papers found.
Wenhao Xu
Wenhao Xu
Unknown affiliation
S
Shuchen Zheng
School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, 100083, China.
Changwei Wang
Changwei Wang
Shandong Computer Science Center
Multimodal LearningEmbodied AIEdge Intelligent ComputingAI for HealthcareSafety Alignment
Z
Zherui Zhang
School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, 100083, China.
C
Chuan Ren
School of Software Microelectronics, Peking University, Beijing, 100871, China.
Rongtao Xu
Rongtao Xu
MBZUAI << CASIA << HUST
Intelligent RobotEmbodied AIVLAVLMSpatialtemporal AI
Shibiao Xu
Shibiao Xu
Beijing University of Posts and Telecommunications
Computer VisionMachine LearningComputer Graphics