SSVIF: Self-Supervised Segmentation-Oriented Visible and Infrared Image Fusion

📅 2025-09-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Visible-infrared image fusion (VIF) for semantic segmentation heavily relies on costly pixel- or task-level annotations, hindering scalable data construction. Method: We propose the first unsupervised, segmentation-oriented VIF framework that requires no labeled data. It introduces a cross-task segmentation consistency mechanism to enforce cross-modal agreement between feature-level and pixel-level predictions, coupled with a two-stage dynamic weighting strategy to jointly optimize fusion and segmentation objectives. Technically, it integrates self-supervised learning, multi-granularity fusion modeling, and consistency regularization. Results: On public benchmarks, our method significantly outperforms conventional unsupervised VIF approaches and achieves segmentation accuracy competitive with state-of-the-art supervised methods—demonstrating, for the first time, high-quality fusion and improved downstream segmentation performance driven solely by unlabeled visible-infrared image pairs.

Technology Category

Application Category

📝 Abstract
Visible and infrared image fusion (VIF) has gained significant attention in recent years due to its wide application in tasks such as scene segmentation and object detection. VIF methods can be broadly classified into traditional VIF methods and application-oriented VIF methods. Traditional methods focus solely on improving the quality of fused images, while application-oriented VIF methods additionally consider the performance of downstream tasks on fused images by introducing task-specific loss terms during training. However, compared to traditional methods, application-oriented VIF methods require datasets labeled for downstream tasks (e.g., semantic segmentation or object detection), making data acquisition labor-intensive and time-consuming. To address this issue, we propose a self-supervised training framework for segmentation-oriented VIF methods (SSVIF). Leveraging the consistency between feature-level fusion-based segmentation and pixel-level fusion-based segmentation, we introduce a novel self-supervised task-cross-segmentation consistency-that enables the fusion model to learn high-level semantic features without the supervision of segmentation labels. Additionally, we design a two-stage training strategy and a dynamic weight adjustment method for effective joint learning within our self-supervised framework. Extensive experiments on public datasets demonstrate the effectiveness of our proposed SSVIF. Remarkably, although trained only on unlabeled visible-infrared image pairs, our SSVIF outperforms traditional VIF methods and rivals supervised segmentation-oriented ones. Our code will be released upon acceptance.
Problem

Research questions and friction points this paper is trying to address.

Developing self-supervised visible-infrared fusion without segmentation labels
Addressing labor-intensive data requirements for application-oriented fusion methods
Enhancing semantic feature learning through cross-segmentation consistency framework
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised framework for segmentation-oriented image fusion
Two-stage training strategy with dynamic weight adjustment
Task-cross-segmentation consistency without segmentation labels