🤖 AI Summary
Existing infrared and visible image fusion methods suffer significant performance degradation under input degradations, as they rely on hand-crafted preprocessing—decoupling degradation handling from fusion. This work proposes the first end-to-end framework that jointly models degradations and performs fusion. We innovatively leverage vision-language models (VLMs) for degradation-aware perception and guided suppression: a Specific Prompt Degradation-Coupled Extractor (SPDCE) suppresses intra-modal degradations, while a Joint Prompt Degradation-Coupled Fusion module (JPDCF) enables cross-modal fusion modeling. Crucially, our method is the first to exploit VLMs’ semantic understanding via prompt engineering to unify degradation identification and feature fusion—breaking the conventional two-stage paradigm. Extensive experiments on diverse real-world degradation scenarios demonstrate substantial improvements over state-of-the-art methods, validating superior robustness and generalization.
📝 Abstract
Existing Infrared and Visible Image Fusion (IVIF) methods typically assume high-quality inputs. However, when handing degraded images, these methods heavily rely on manually switching between different pre-processing techniques. This decoupling of degradation handling and image fusion leads to significant performance degradation. In this paper, we propose a novel VLM-Guided Degradation-Coupled Fusion network (VGDCFusion), which tightly couples degradation modeling with the fusion process and leverages vision-language models (VLMs) for degradation-aware perception and guided suppression. Specifically, the proposed Specific-Prompt Degradation-Coupled Extractor (SPDCE) enables modality-specific degradation awareness and establishes a joint modeling of degradation suppression and intra-modal feature extraction. In parallel, the Joint-Prompt Degradation-Coupled Fusion (JPDCF) facilitates cross-modal degradation perception and couples residual degradation filtering with complementary cross-modal feature fusion. Extensive experiments demonstrate that our VGDCFusion significantly outperforms existing state-of-the-art fusion approaches under various degraded image scenarios. Our code is available at https://github.com/Lmmh058/VGDCFusion.