Wavelet-guided Misalignment-aware Network for Visible-Infrared Object Detection

📅 2025-07-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Visible-light–infrared cross-modal object detection is commonly hindered by feature misalignment arising from resolution disparities, spatial shifts, and modality inconsistencies, leading to alignment difficulties and noise interference. To address this, we propose a misalignment-aware unified detection framework featuring two key innovations: (1) wavelet-guided multi-frequency feature decomposition, which decouples features across frequency domains via discrete wavelet transform; and (2) modality-aware adaptive fusion, employing misalignment-sensitive cross-modal guidance to dynamically rectify misaligned features and suppress spurious responses. Our approach achieves state-of-the-art performance on DVTOD, DroneVehicle, and M3FD benchmarks, significantly improving detection accuracy and robustness under severe misalignment conditions. By explicitly modeling cross-modal alignment in the frequency domain, the method provides an interpretable and generalizable paradigm for cross-modal feature learning.

Technology Category

Application Category

📝 Abstract
Visible-infrared object detection aims to enhance the detection robustness by exploiting the complementary information of visible and infrared image pairs. However, its performance is often limited by frequent misalignments caused by resolution disparities, spatial displacements, and modality inconsistencies. To address this issue, we propose the Wavelet-guided Misalignment-aware Network (WMNet), a unified framework designed to adaptively address different cross-modal misalignment patterns. WMNet incorporates wavelet-based multi-frequency analysis and modality-aware fusion mechanisms to improve the alignment and integration of cross-modal features. By jointly exploiting low and high-frequency information and introducing adaptive guidance across modalities, WMNet alleviates the adverse effects of noise, illumination variation, and spatial misalignment. Furthermore, it enhances the representation of salient target features while suppressing spurious or misleading information, thereby promoting more accurate and robust detection. Extensive evaluations on the DVTOD, DroneVehicle, and M3FD datasets demonstrate that WMNet achieves state-of-the-art performance on misaligned cross-modal object detection tasks, confirming its effectiveness and practical applicability.
Problem

Research questions and friction points this paper is trying to address.

Address misalignments in visible-infrared object detection
Improve cross-modal feature alignment and fusion
Enhance detection robustness against noise and misalignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Wavelet-guided multi-frequency analysis for alignment
Modality-aware fusion for cross-modal integration
Adaptive guidance to reduce noise and misalignment
🔎 Similar Papers
No similar papers found.
H
Haote Zhang
School of Computer Science, Jiangsu University of Technology, Changzhou, China
L
Lipeng Gu
School of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China
Wuzhou Quan
Wuzhou Quan
南京航空航天大学
Computer VisionPattern RecognitionRemote Sensing
Fu Lee Wang
Fu Lee Wang
Hong Kong Metropolitan University
AIData ScienceLearning Technology
H
Honghui Fan
School of Computer Science, Jiangsu University of Technology, Changzhou, China
J
Jiali Tang
School of Computer Science, Jiangsu University of Technology, Changzhou, China
D
Dingkun Zhu
School of Computer Science, Jiangsu University of Technology, Changzhou, China
H
Haoran Xie
School of Data Science, Lingnan University, Hong Kong, China
Xiaoping Zhang
Xiaoping Zhang
China National Bamboo Research Center
BambooSoil ecologyMetagenomics
Mingqiang Wei
Mingqiang Wei
Professor at Nanjing University of Aeronautics and Astronautics
3D VisionMultimodal FusionComputer GraphicsDeep Geometry LearningCAD