🤖 AI Summary
Infrared small target detection (IRSTD) suffers from low signal-to-noise ratio, complex heterogeneous backgrounds, and intrinsically weak target features. Conventional encoder-decoder architectures employ static decoder parameters, limiting their adaptability to cross-scenario distribution shifts—e.g., day/night conditions or sky/maritime/terrestrial backgrounds—and thus impairing robustness. To address this, we propose an image-state-aware meta-decoding framework: for the first time, the input image is directly fed into the decoder to dynamically generate decoding parameters via a Transformer; inter-layer dependencies are modeled via self-attention, while cross-attention enables scene-adaptive decoding; high-frequency information is explicitly injected to enhance edge preservation and localization accuracy. Innovatively, we adopt 2D tensorized parameter representation coupled with a meta-learning mechanism to achieve hierarchical feature correlation. Our method achieves state-of-the-art performance on NUDT-SIRST, NUAA-SIRST, and IRSTD-1K, demonstrating significantly improved cross-scenario generalization.
📝 Abstract
Infrared Small Target Detection (IRSTD) faces significant challenges due to low signal-to-noise ratios, complex backgrounds, and the absence of discernible target features. While deep learning-based encoder-decoder frameworks have advanced the field, their static pattern learning suffers from pattern drift across diverse scenarios (emph{e.g.}, day/night variations, sky/maritime/ground domains), limiting robustness. To address this, we propose IrisNet, a novel meta-learned framework that dynamically adapts detection strategies to the input infrared image status. Our approach establishes a dynamic mapping between infrared image features and entire decoder parameters via an image-to-decoder transformer. More concretely, we represent the parameterized decoder as a structured 2D tensor preserving hierarchical layer correlations and enable the transformer to model inter-layer dependencies through self-attention while generating adaptive decoding patterns via cross-attention. To further enhance the perception ability of infrared images, we integrate high-frequency components to supplement target-position and scene-edge information. Experiments on NUDT-SIRST, NUAA-SIRST, and IRSTD-1K datasets demonstrate the superiority of our IrisNet, achieving state-of-the-art performance.