Dual-Domain Perspective on Degradation-Aware Fusion: A VLM-Guided Robust Infrared and Visible Image Fusion Framework

📅 2025-09-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing infrared–visible image fusion methods assume high-quality inputs and struggle with dual-source degradations (e.g., blur, noise, low illumination), relying heavily on manual pre-enhancement that introduces error accumulation. To address this, we propose a vision-language model (VLM)-guided dual-domain joint optimization framework. First, we leverage VLMs for degradation semantic awareness—introducing the first VLM-based degradation perception in fusion. Second, we establish a synergistic mechanism integrating frequency-domain degradation modeling with spatial-domain cross-modal filtering. Third, we design a modality-specific subband extraction module coupled with adaptive feature aggregation for end-to-end degradation-adaptive fusion. Extensive experiments under diverse degradation combinations demonstrate substantial improvements in fusion quality, structural fidelity, and modality complementarity. Both qualitative and quantitative evaluations consistently outperform state-of-the-art methods.

Technology Category

Application Category

📝 Abstract
Most existing infrared-visible image fusion (IVIF) methods assume high-quality inputs, and therefore struggle to handle dual-source degraded scenarios, typically requiring manual selection and sequential application of multiple pre-enhancement steps. This decoupled pre-enhancement-to-fusion pipeline inevitably leads to error accumulation and performance degradation. To overcome these limitations, we propose Guided Dual-Domain Fusion (GD^2Fusion), a novel framework that synergistically integrates vision-language models (VLMs) for degradation perception with dual-domain (frequency/spatial) joint optimization. Concretely, the designed Guided Frequency Modality-Specific Extraction (GFMSE) module performs frequency-domain degradation perception and suppression and discriminatively extracts fusion-relevant sub-band features. Meanwhile, the Guided Spatial Modality-Aggregated Fusion (GSMAF) module carries out cross-modal degradation filtering and adaptive multi-source feature aggregation in the spatial domain to enhance modality complementarity and structural consistency. Extensive qualitative and quantitative experiments demonstrate that GD^2Fusion achieves superior fusion performance compared with existing algorithms and strategies in dual-source degraded scenarios. The code will be publicly released after acceptance of this paper.
Problem

Research questions and friction points this paper is trying to address.

Handles dual-source degraded infrared-visible image fusion
Integrates vision-language models for degradation perception
Performs joint optimization in frequency and spatial domains
Innovation

Methods, ideas, or system contributions that make the work stand out.

VLM-guided dual-domain joint optimization
Frequency-domain degradation perception and suppression
Cross-modal degradation filtering and feature aggregation
🔎 Similar Papers
No similar papers found.
T
Tianpei Zhang
J
Jufeng Zhao
Yiming Zhu
Yiming Zhu
Phd of AI
Social computingInternet MeasurementsData Science
G
Guangmang Cui