IRDFusion: Iterative Relation-Map Difference guided Feature Fusion for Multispectral Object Detection

📅 2025-09-10

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

To address background noise and common-mode interference in feature fusion for multispectral object detection, this paper proposes a fusion framework based on cross-modal contrastive learning and iterative differential optimization. Methodologically, it introduces two novel modules: the Mutual Feature Refinement Module (MFRM) and the Differential Feature Feedback Module (DFFM), which jointly emulate a feedback-based differential amplification mechanism to dynamically generate differential guidance signals. Furthermore, it integrates cross-modal contrastive learning, relational graph modeling, and iterative optimization to adaptively enhance salient structural features while suppressing noise. Evaluated on FLIR, LLVIP, and M$^3$FD benchmarks, the method achieves state-of-the-art performance—demonstrating significant improvements in cross-modal alignment accuracy and robustness under complex, cluttered scenes.

Technology Category

Application Category

📝 Abstract

Current multispectral object detection methods often retain extraneous background or noise during feature fusion, limiting perceptual performance.To address this, we propose an innovative feature fusion framework based on cross-modal feature contrastive and screening strategy, diverging from conventional approaches. The proposed method adaptively enhances salient structures by fusing object-aware complementary cross-modal features while suppressing shared background interference.Our solution centers on two novel, specially designed modules: the Mutual Feature Refinement Module (MFRM) and the Differential Feature Feedback Module (DFFM). The MFRM enhances intra- and inter-modal feature representations by modeling their relationships, thereby improving cross-modal alignment and discriminative power.Inspired by feedback differential amplifiers, the DFFM dynamically computes inter-modal differential features as guidance signals and feeds them back to the MFRM, enabling adaptive fusion of complementary information while suppressing common-mode noise across modalities. To enable robust feature learning, the MFRM and DFFM are integrated into a unified framework, which is formally formulated as an Iterative Relation-Map Differential Guided Feature Fusion mechanism, termed IRDFusion. IRDFusion enables high-quality cross-modal fusion by progressively amplifying salient relational signals through iterative feedback, while suppressing feature noise, leading to significant performance gains.In extensive experiments on FLIR, LLVIP and M$^3$FD datasets, IRDFusion achieves state-of-the-art performance and consistently outperforms existing methods across diverse challenging scenarios, demonstrating its robustness and effectiveness. Code will be available at https://github.com/61s61min/IRDFusion.git.

Problem

Research questions and friction points this paper is trying to address.

Reducing background noise in multispectral object detection

Enhancing cross-modal feature fusion for better performance

Suppressing common-mode interference while fusing complementary features

Innovation

Methods, ideas, or system contributions that make the work stand out.

Iterative feedback mechanism for feature fusion

Differential Feature Feedback Module suppresses noise

Mutual Feature Refinement Module enhances cross-modal alignment

🔎 Similar Papers

RGBT Tracking via All-layer Multimodal Interactions with Progressive Fusion Mamba