On Modality Incomplete Infrared-Visible Object Detection: An Architecture Compatibility Perspective

📅 2025-11-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Infrared–visible object detection (IVOD) suffers severe performance degradation under modality missing—especially when the dominant modality is absent. Method: This paper proposes Scarf-DETR, the first DETR-based detector extension framework supporting arbitrary modality combinations, grounded in architectural compatibility. Its core innovations include: (i) a plug-and-play Scarf Neck module, (ii) pseudo-modality dropout training, and (iii) a modality-agnostic deformable attention mechanism, enabling unified modeling for single- or dual-modal inputs. Additionally, we introduce the first comprehensive IVOD benchmark covering both dominant and subordinate modality missing scenarios. Results: Experiments demonstrate that Scarf-DETR significantly outperforms existing methods under incomplete modality conditions, achieves state-of-the-art accuracy on standard IVOD benchmarks, and exhibits strong robustness, high cross-modal compatibility, and practical deployability.

Technology Category

Application Category

📝 Abstract
Infrared and visible object detection (IVOD) is essential for numerous around-the-clock applications. Despite notable advancements, current IVOD models exhibit notable performance declines when confronted with incomplete modality data, particularly if the dominant modality is missing. In this paper, we take a thorough investigation on modality incomplete IVOD problem from an architecture compatibility perspective. Specifically, we propose a plug-and-play Scarf Neck module for DETR variants, which introduces a modality-agnostic deformable attention mechanism to enable the IVOD detector to flexibly adapt to any single or double modalities during training and inference. When training Scarf-DETR, we design a pseudo modality dropout strategy to fully utilize the multi-modality information, making the detector compatible and robust to both working modes of single and double modalities. Moreover, we introduce a comprehensive benchmark for the modality-incomplete IVOD task aimed at thoroughly assessing situations where the absent modality is either dominant or secondary. Our proposed Scarf-DETR not only performs excellently in missing modality scenarios but also achieves superior performances on the standard IVOD modality complete benchmarks. Our code will be available at https://github.com/YinghuiXing/Scarf-DETR.
Problem

Research questions and friction points this paper is trying to address.

Addresses performance decline in infrared-visible object detection with missing modalities
Proposes architecture compatibility for handling incomplete modality data during training
Introduces benchmark for evaluating missing dominant or secondary modality scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Plug-and-play Scarf Neck module for DETR variants
Modality-agnostic deformable attention mechanism
Pseudo modality dropout strategy for training
🔎 Similar Papers
No similar papers found.
S
Shuo Yang
Northwestern Polytechnical University, China
Y
Yinghui Xing
Northwestern Polytechnical University, China
Shizhou Zhang
Shizhou Zhang
Northwestern Polytechnical University
computer visionmachine learning
Z
Zhilong Niu
Northwestern Polytechnical University, China