🤖 AI Summary
Mobile infrared small target detection faces challenges of high annotation costs, extremely low signal-to-clutter ratios, and sub-pixel-sized targets. Method: We propose WeCoL—the first weakly supervised contrastive learning framework for this task—requiring only video-level target count prompts for training. It innovatively integrates SAM-generated target activation maps, multi-frame energy accumulation, latent target mining, and long-/short-term motion-aware modeling, while establishing a contrastive learning mechanism in feature subspaces to enhance pseudo-label reliability. Contribution/Results: WeCoL is the first to introduce weak supervision into infrared small target detection, eliminating reliance on pixel-level annotations inherent in fully supervised paradigms. Experiments on DAUB and ITSDT-15K demonstrate that WeCoL achieves over 90% of the performance of state-of-the-art fully supervised methods, significantly reducing annotation overhead.
📝 Abstract
Different from general object detection, moving infrared small target detection faces huge challenges due to tiny target size and weak background contrast.Currently, most existing methods are fully-supervised, heavily relying on a large number of manual target-wise annotations. However, manually annotating video sequences is often expensive and time-consuming, especially for low-quality infrared frame images. Inspired by general object detection, non-fully supervised strategies ($e.g.$, weakly supervised) are believed to be potential in reducing annotation requirements. To break through traditional fully-supervised frameworks, as the first exploration work, this paper proposes a new weakly-supervised contrastive learning (WeCoL) scheme, only requires simple target quantity prompts during model training.Specifically, in our scheme, based on the pretrained segment anything model (SAM), a potential target mining strategy is designed to integrate target activation maps and multi-frame energy accumulation.Besides, contrastive learning is adopted to further improve the reliability of pseudo-labels, by calculating the similarity between positive and negative samples in feature subspace.Moreover, we propose a long-short term motion-aware learning scheme to simultaneously model the local motion patterns and global motion trajectory of small targets.The extensive experiments on two public datasets (DAUB and ITSDT-15K) verify that our weakly-supervised scheme could often outperform early fully-supervised methods. Even, its performance could reach over 90% of state-of-the-art (SOTA) fully-supervised ones.