🤖 AI Summary
Infrared small target detection (ISTD) faces two major challenges: severe lack of target texture and heavy background clutter. To address these, this paper proposes a basis-decomposition-based feature disentanglement framework, which systematically introduces a novel difference decomposition mechanism. Specifically, we design four components: a scalable Basis Decomposition Module (BDM), a Spatial Difference Decomposition Module (SD²M), a Spatial Difference Decomposition Downsampling Module (SD³M), and a Temporal Difference Decomposition Module (TD²M), enabling simultaneous target feature enhancement and background interference suppression. Our method integrates an improved U-shaped architecture with explicit motion information modeling, supporting both single-frame and multi-frame detection. Evaluated on the SISTD and MISTD benchmarks, our approach achieves state-of-the-art performance: STD²Net attains 87.68% mIoU on the multi-frame task—substantially outperforming the single-frame SD²Net (64.97%)—demonstrating the effectiveness and generalizability of difference decomposition for feature disentanglement.
📝 Abstract
Infrared small target detection (ISTD) faces two major challenges: a lack of discernible target texture and severe background clutter, which results in the background obscuring the target. To enhance targets and suppress backgrounds, we propose the Basis Decomposition Module (BDM) as an extensible and lightweight module based on basis decomposition, which decomposes a complex feature into several basis features and enhances certain information while eliminating redundancy. Extending BDM leads to a series of modules, including the Spatial Difference Decomposition Module (SD$^mathrm{2}$M), Spatial Difference Decomposition Downsampling Module (SD$^mathrm{3}$M), and Temporal Difference Decomposition Module (TD$^mathrm{2}$M). Based on these modules, we develop the Spatial Difference Decomposition Network (SD$^mathrm{2}$Net) for single-frame ISTD (SISTD) and the Spatiotemporal Difference Decomposition Network (STD$^mathrm{2}$Net) for multi-frame ISTD (MISTD). SD$^mathrm{2}$Net integrates SD$^mathrm{2}$M and SD$^mathrm{3}$M within an adapted U-shaped architecture. We employ TD$^mathrm{2}$M to introduce motion information, which transforms SD$^mathrm{2}$Net into STD$^mathrm{2}$Net. Extensive experiments on SISTD and MISTD datasets demonstrate state-of-the-art (SOTA) performance. On the SISTD task, SD$^mathrm{2}$Net performs well compared to most established networks. On the MISTD datasets, STD$^mathrm{2}$Net achieves a mIoU of 87.68%, outperforming SD$^mathrm{2}$Net, which achieves a mIoU of 64.97%. Our codes are available: https://github.com/greekinRoma/IRSTD_HC_Platform.