CtrlFuse: Mask-Prompt Guided Controllable Infrared and Visible Image Fusion

📅 2026-01-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing infrared and visible image fusion methods struggle to balance downstream task adaptability with semantic controllability. To address this limitation, this work proposes a mask prompt–guided controllable fusion framework that introduces, for the first time, an interactive mask prompting mechanism. By leveraging a reference prompt encoder, the method dynamically extracts task-specific semantics and explicitly injects this semantic information during the fusion process. The framework jointly optimizes fusion and segmentation objectives, enabling effective synergy between multimodal features and semantic prompts. Experimental results demonstrate that the proposed approach achieves state-of-the-art performance in both fusion controllability and segmentation accuracy, with the fine-tuned segmentation branch even surpassing the original pre-trained model in performance.

Technology Category

Application Category

📝 Abstract
Infrared and visible image fusion generates all-weather perception-capable images by combining complementary modalities, enhancing environmental awareness for intelligent unmanned systems. Existing methods either focus on pixel-level fusion while overlooking downstream task adaptability or implicitly learn rigid semantics through cascaded detection/segmentation models, unable to interactively address diverse semantic target perception needs. We propose CtrlFuse, a controllable image fusion framework that enables interactive dynamic fusion guided by mask prompts. The model integrates a multi-modal feature extractor, a reference prompt encoder (RPE), and a prompt-semantic fusion module (PSFM). The RPE dynamically encodes task-specific semantic prompts by fine-tuning pre-trained segmentation models with input mask guidance, while the PSFM explicitly injects these semantics into fusion features. Through synergistic optimization of parallel segmentation and fusion branches, our method achieves mutual enhancement between task performance and fusion quality. Experiments demonstrate state-of-the-art results in both fusion controllability and segmentation accuracy, with the adapted task branch even outperforming the original segmentation model.
Problem

Research questions and friction points this paper is trying to address.

infrared and visible image fusion
downstream task adaptability
interactive semantic control
controllable fusion
Innovation

Methods, ideas, or system contributions that make the work stand out.

controllable fusion
mask-prompt guidance
semantic-aware fusion
multi-modal image fusion
interactive fusion
🔎 Similar Papers
No similar papers found.