Rethinking Infrared Small Target Detection: A Foundation-Driven Efficient Paradigm

📅 2025-12-05

📈 Citations: 0

✨ Influential: 0

career value

227K/year

🤖 AI Summary

To address the trade-off between weak generalization capability and high inference overhead in single-frame infrared small target detection (SIRST), this paper proposes the Foundation-Driven Efficient Paradigm (FDEP). FDEP freezes representations from a large-scale vision foundation model (VFM) as fixed semantic priors—marking the first such application in SIRST. It introduces a Semantic Alignment Modulation Fusion (SAMF) module to dynamically integrate these global priors with task-specific features. Furthermore, it employs Collaborative Optimization-based Implicit Self-Distillation (CO-ISD), enabling knowledge transfer without additional inference cost. To holistically evaluate detection performance, we propose the Hierarchical Sensitivity Evaluation (HSE) metric, which unifies assessment across multiple detection thresholds. Extensive experiments on multiple benchmarks demonstrate that FDEP achieves state-of-the-art performance, significantly improving both accuracy and robustness. The source code is publicly available.

Technology Category

Application Category

📝 Abstract

While large-scale visual foundation models (VFMs) exhibit strong generalization across diverse visual domains, their potential for single-frame infrared small target (SIRST) detection remains largely unexplored. To fill this gap, we systematically introduce the frozen representations from VFMs into the SIRST task for the first time and propose a Foundation-Driven Efficient Paradigm (FDEP), which can seamlessly adapt to existing encoder-decoder-based methods and significantly improve accuracy without additional inference overhead. Specifically, a Semantic Alignment Modulation Fusion (SAMF) module is designed to achieve dynamic alignment and deep fusion of the global semantic priors from VFMs with task-specific features. Meanwhile, to avoid the inference time burden introduced by VFMs, we propose a Collaborative Optimization-based Implicit Self-Distillation (CO-ISD) strategy, which enables implicit semantic transfer between the main and lightweight branches through parameter sharing and synchronized backpropagation. In addition, to unify the fragmented evaluation system, we construct a Holistic SIRST Evaluation (HSE) metric that performs multi-threshold integral evaluation at both pixel-level confidence and target-level robustness, providing a stable and comprehensive basis for fair model comparison. Extensive experiments demonstrate that the SIRST detection networks equipped with our FDEP framework achieve state-of-the-art (SOTA) performance on multiple public datasets. Our code is available at https://github.com/YuChuang1205/FDEP-Framework

Problem

Research questions and friction points this paper is trying to address.

Adapts foundation models to infrared target detection

Improves accuracy without extra inference cost

Unifies evaluation metrics for fair model comparison

Innovation

Methods, ideas, or system contributions that make the work stand out.

Frozen foundation model representations for SIRST detection

Semantic alignment fusion module for feature integration

Implicit self-distillation strategy for efficient inference

🔎 Similar Papers

Infrared Small Target Detection based on Adjustable Sensitivity Strategy and Multi-Scale Fusion