FSMODNet: A Closer Look at Few-Shot Detection in Multispectral Data

📅 2025-09-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Few-shot multispectral object detection (FSMOD) aims to jointly detect objects from visible and infrared modalities using only a minimal number of annotated samples. To address this challenge, we propose FSMODNet—a novel framework that, for the first time, integrates deformable attention into FSMOD to enable dynamic cross-modal feature alignment and complementary enhancement. FSMODNet further combines a feature pyramid network with few-shot learning strategies to improve generalization under data scarcity and extreme illumination conditions. Extensive experiments on two public multispectral benchmarks demonstrate that FSMODNet consistently outperforms multiple strong baselines built upon state-of-the-art detectors. Ablation studies confirm the effectiveness of each component, particularly the deformable attention module in bridging modality gaps. Overall, FSMODNet achieves superior detection accuracy and robustness in resource-constrained scenarios, establishing new performance benchmarks for few-shot multispectral detection.

Technology Category

Application Category

📝 Abstract
Few-shot multispectral object detection (FSMOD) addresses the challenge of detecting objects across visible and thermal modalities with minimal annotated data. In this paper, we explore this complex task and introduce a framework named "FSMODNet" that leverages cross-modality feature integration to improve detection performance even with limited labels. By effectively combining the unique strengths of visible and thermal imagery using deformable attention, the proposed method demonstrates robust adaptability in complex illumination and environmental conditions. Experimental results on two public datasets show effective object detection performance in challenging low-data regimes, outperforming several baselines we established from state-of-the-art models. All code, models, and experimental data splits can be found at https://anonymous.4open.science/r/Test-B48D.
Problem

Research questions and friction points this paper is trying to address.

Detecting objects across visible and thermal modalities with minimal annotated data
Improving detection performance through cross-modality feature integration with limited labels
Achieving robust adaptability in complex illumination and environmental conditions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages cross-modality feature integration
Combines visible and thermal imagery
Uses deformable attention for robust adaptability
🔎 Similar Papers
No similar papers found.