OCCO: LVM-guided Infrared and Visible Image Fusion Framework based on Object-aware and Contextual COntrastive Learning

📅 2025-03-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing infrared–visible image fusion methods struggle to simultaneously achieve high fusion quality and strong downstream task performance. To address this, we propose OCCO, an LVM-guided fusion framework that introduces large vision model (LVM)-based semantic distillation for object-aware perception and contextual modeling—marking the first such application in fusion. OCCO features a dual-path contrastive learning mechanism explicitly preserving target integrity within the fused feature space, alongside a cross-modal feature interaction fusion network designed to mitigate modality conflicts. Evaluated on four benchmark datasets, OCCO consistently outperforms eight state-of-the-art methods, achieving significant gains in both fusion quality metrics (e.g., PSNR, SSIM) and downstream object detection performance (up to +3.2% mAP). The framework thus enables synergistic optimization of high-fidelity fusion and robust task generalization.

Technology Category

Application Category

📝 Abstract
Image fusion is a crucial technique in the field of computer vision, and its goal is to generate high-quality fused images and improve the performance of downstream tasks. However, existing fusion methods struggle to balance these two factors. Achieving high quality in fused images may result in lower performance in downstream visual tasks, and vice versa. To address this drawback, a novel LVM (large vision model)-guided fusion framework with Object-aware and Contextual COntrastive learning is proposed, termed as OCCO. The pre-trained LVM is utilized to provide semantic guidance, allowing the network to focus solely on fusion tasks while emphasizing learning salient semantic features in form of contrastive learning. Additionally, a novel feature interaction fusion network is also designed to resolve information conflicts in fusion images caused by modality differences. By learning the distinction between positive samples and negative samples in the latent feature space (contextual space), the integrity of target information in fused image is improved, thereby benefiting downstream performance. Finally, compared with eight state-of-the-art methods on four datasets, the effectiveness of the proposed method is validated, and exceptional performance is also demonstrated on downstream visual task.
Problem

Research questions and friction points this paper is trying to address.

Balancing fused image quality and downstream task performance
Resolving modality-induced information conflicts in fusion
Enhancing target integrity via contrastive semantic learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

LVM-guided fusion with semantic contrastive learning
Object-aware feature interaction network design
Contextual contrastive learning for target integrity
🔎 Similar Papers
No similar papers found.
H
Hui Li
International Joint Laboratory on Artificial Intelligence of Jiangsu Province, School of Artificial Intelligence and Computer Science, Jiangnan University, LihuRoad, Wuxi, 100190, Jiangsu, China.
C
Congcong Bian
International Joint Laboratory on Artificial Intelligence of Jiangsu Province, School of Artificial Intelligence and Computer Science, Jiangnan University, LihuRoad, Wuxi, 100190, Jiangsu, China.
Z
Zeyang Zhang
International Joint Laboratory on Artificial Intelligence of Jiangsu Province, School of Artificial Intelligence and Computer Science, Jiangnan University, LihuRoad, Wuxi, 100190, Jiangsu, China.
Xiaoning Song
Xiaoning Song
Professor of Computer Vision and Pattern Recognition, Jiangnan University
Pattern RecognitionComputer VisionArtificial Intelligence
X
Xi Li
College of Computer Science and Technology, Zhejiang University, Hangzhou, 310007, Zhejiang, China.
Xiao-Jun Wu
Xiao-Jun Wu
School of Artificial Intelligence and Computer Science, Jiangnan University
artificial intelligencepattern recognitionmachine learning