MultiTaskVIF: Segmentation-oriented visible and infrared image fusion via multi-task learning

📅 2025-05-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing visible-infrared image fusion (VIF) methods typically adopt a cascaded fusion-then-segmentation architecture, resulting in model redundancy, high computational overhead, and implicit—rather than explicit—semantic modeling during fusion. Method: We propose MultiTaskVIF, the first lightweight, general-purpose multi-task VIF framework. It employs a shared encoder and a dual-path multi-task head (MTH) decoder to jointly and end-to-end generate high-fidelity fused images and pixel-level semantic segmentation maps, embedding segmentation supervision directly into the fusion process for the first time. Contribution/Results: MultiTaskVIF eliminates the need for a separate segmentation model, substantially reducing parameter count and computational cost. Extensive experiments on multiple benchmark datasets demonstrate superior fusion quality and downstream segmentation accuracy over state-of-the-art methods, validating the effectiveness and efficiency of semantic-guided fusion.

Technology Category

Application Category

📝 Abstract
Visible and infrared image fusion (VIF) has attracted significant attention in recent years. Traditional VIF methods primarily focus on generating fused images with high visual quality, while recent advancements increasingly emphasize incorporating semantic information into the fusion model during training. However, most existing segmentation-oriented VIF methods adopt a cascade structure comprising separate fusion and segmentation models, leading to increased network complexity and redundancy. This raises a critical question: can we design a more concise and efficient structure to integrate semantic information directly into the fusion model during training-Inspired by multi-task learning, we propose a concise and universal training framework, MultiTaskVIF, for segmentation-oriented VIF models. In this framework, we introduce a multi-task head decoder (MTH) to simultaneously output both the fused image and the segmentation result during training. Unlike previous cascade training frameworks that necessitate joint training with a complete segmentation model, MultiTaskVIF enables the fusion model to learn semantic features by simply replacing its decoder with MTH. Extensive experimental evaluations validate the effectiveness of the proposed method. Our code will be released upon acceptance.
Problem

Research questions and friction points this paper is trying to address.

Integrate semantic information into visible and infrared image fusion
Reduce network complexity in segmentation-oriented VIF methods
Design a concise multi-task framework for simultaneous fusion and segmentation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-task learning for image fusion and segmentation
Multi-task head decoder outputs fused image and segmentation
Simplified training framework without separate segmentation model
🔎 Similar Papers
No similar papers found.