🤖 AI Summary
Infrared small target detection (IRSTD) faces significant challenges in complex backgrounds due to the difficulty of distinguishing weak target signals from strong clutter. To address this, we propose a spatial–frequency collaborative modeling network: (1) Haar wavelet convolution is introduced to explicitly capture target energy characteristics in the frequency domain; (2) a shift-based spatial attention (SSA) mechanism enables low-complexity long-range dependency modeling; and (3) residual dual-channel attention (RDCA) adaptively suppresses background interference. Our approach is the first to deeply integrate interpretable frequency-domain priors with efficient attention mechanisms, substantially enhancing weak-target perception. Extensive experiments on mainstream benchmarks—including M3TD and IRSTD—demonstrate consistent improvements, achieving 3.2–5.7% gains in mean average precision (mAP). Notably, the method exhibits superior robustness under challenging conditions such as cloud cover and heavy noise. This work establishes a novel paradigm for real-time, high-precision IRSTD.
📝 Abstract
Infrared small target detection (IRSTD) is thus critical in both civilian and military applications. This study addresses the challenge of precisely IRSTD in complex backgrounds. Recent methods focus fundamental reliance on conventional convolution operations, which primarily capture local spatial patterns and struggle to distinguish the unique frequency-domain characteristics of small targets from intricate background clutter. To overcome these limitations, we proposed the Synergistic Wavelet-Attention Network (SWAN), a novel framework designed to perceive targets from both spatial and frequency domains. SWAN leverages a Haar Wavelet Convolution (HWConv) for a deep, cross-domain fusion of the frequency energy and spatial details of small target. Furthermore, a Shifted Spatial Attention (SSA) mechanism efficiently models long-range spatial dependencies with linear computational complexity, enhancing contextual awareness. Finally, a Residual Dual-Channel Attention (RDCA) module adaptively calibrates channel-wise feature responses to suppress background interference while amplifying target-pertinent signals. Extensive experiments on benchmark datasets demonstrate that SWAN surpasses existing state-of-the-art methods, showing significant improvements in detection accuracy and robustness, particularly in complex challenging scenarios.