🤖 AI Summary
This work addresses the challenges of infrared small target detection, which suffers from high false alarm rates due to weak target signatures, low signal-to-clutter ratios, and complex background interference. Existing U-Net architectures are limited by early-stage information bottlenecks and static skip connections. To overcome these issues, the authors propose SANet, featuring a novel Dual-path Semantic-aware Module (DSM) that integrates standard and windmill convolutions to enhance directional sensitivity while preserving fine-grained local details. Additionally, a Selective Attention Fusion Module (SAFM) is introduced, combining the CBAM attention mechanism with a learnable spatially adaptive weighting strategy to enable context-aware, dynamic cross-scale feature fusion. Extensive experiments demonstrate that the proposed method significantly improves detection accuracy and robustness, effectively suppressing false alarms in heavy clutter and outperforming state-of-the-art approaches.
📝 Abstract
Infrared small target detection (IRSTD) plays a pivotal role in a broad spectrum of mission-critical applications, including maritime surveillance, military search and rescue, early warning systems, and precision-guided strikes, all of which demand the precise identification of dim, sub-pixel targets amid highly cluttered infrared backgrounds. Despite significant progress driven by deep learning methods, fundamental challenges persist: infrared small targets occupy extremely limited spatial extents (often only a few pixels), exhibit low signal-to-clutter ratios, and are easily confused with structurally complex backgrounds that frequently induce false alarms. Existing encoder-decoder architectures suffer from two key limitations - an information bottleneck in early convolutional stages that undermines fine-grained target perception, and static skip connections that lack the dynamic adaptability required to discriminate between genuine targets and pseudo-target regions. To address these challenges, we propose SANet, a Selective Attention-based Network built upon the classical U-Net framework and augmented with two novel components: (1) a \emph{Dual-path Semantic-aware Module} (DSM) that integrates standard convolutions for local spatial detail preservation with pinwheel-shaped convolutions for expanded, direction-sensitive receptive fields, followed by a Convolutional Block Attention Module (CBAM) for fine-grained spatial-channel feature recalibration; and (2) a \emph{Selective Attention Fusion Module} (SAFM) that replaces conventional static skip connections with a spatially adaptive, learnable weighting mechanism to perform context-aware, cross-scale feature fusion.