Revisiting the Scale Loss Function and Gaussian-Shape Convolution for Infrared Small Target Detection

πŸ“… 2026-04-10
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

209K/year
πŸ€– AI Summary
This work addresses training instability in infrared small target detection caused by non-monotonic scale loss and insufficient spatial attention due to generic convolutional kernels that overlook the target’s center-concentrated intensity distribution. To this end, the authors propose a novel approach that jointly models physical imaging characteristics and the geometric properties of the loss function. Key innovations include a strictly monotonic difference-based scale loss derived from signed area discrepancy, a Gaussian-shaped convolutional kernel with learnable scale parameters, and a rotationally adaptive windmill mask for directional alignment, enabled by a straight-through estimator for end-to-end training. Comprehensive experiments on IRSTD-1k, NUDT-SIRST, and SIRST-UAVB benchmarks demonstrate significant improvements over state-of-the-art methods in terms of mIoU, probability of detection (Pd), and false alarm rate (Fa).

Technology Category

Application Category

πŸ“ Abstract
Infrared small target detection still faces two persistent challenges: training instability from non-monotonic scale loss functions, and inadequate spatial attention due to generic convolution kernels that ignore the physical imaging characteristics of small targets. In this paper, we revisit both aspects. For the loss side, we propose a \emph{diff-based scale loss} that weights predictions according to the signed area difference between the predicted mask and the ground truth, yielding strictly monotonic gradients and stable convergence. We further analyze a family of four scale loss variants to understand how their geometric properties affect detection behavior. For the spatial side, we introduce \emph{Gaussian-shaped convolution} with a learnable scale parameter to match the center-concentrated intensity profile of infrared small targets, and augment it with a \emph{rotated pinwheel mask} that adaptively aligns the kernel with target orientation via a straight-through estimator. Extensive experiments on IRSTD-1k, NUDT-SIRST, and SIRST-UAVB demonstrate consistent improvements in $mIoU$, $P_d$, and $F_a$ over state-of-the-art methods. We release our anonymous code and pretrained models.
Problem

Research questions and friction points this paper is trying to address.

infrared small target detection
scale loss function
spatial attention
convolution kernel
training instability
Innovation

Methods, ideas, or system contributions that make the work stand out.

diff-based scale loss
Gaussian-shaped convolution
rotated pinwheel mask
infrared small target detection
monotonic gradient