Learning a Unified Degradation-aware Representation Model for Multi-modal Image Fusion

📅 2025-03-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing infrared–visible image fusion methods rely heavily on synthetically generated multi-modal, multi-quality paired data, limiting their generalizability to real-world degradation scenarios. To address this, we propose a degradation-aware unified representation learning framework. Our key contributions are: (1) a novel data-level disentanglement and re-coupling mechanism in a shared latent feature space, explicitly modeling cross-modal degradation discrepancies; (2) a unified loss function designed to support training on real degraded data; and (3) text-guided attention (TGA) to enhance semantic alignment between modalities and preserve fine-grained details. Integrating intra-residual architecture with degradation-aware joint optimization, our method achieves state-of-the-art performance across generic fusion, degradation-robust fusion, and downstream detection/segmentation tasks. Notably, it is the first to jointly realize realistic degradation modeling and high-fidelity fusion within a single unified framework.

Technology Category

Application Category

📝 Abstract
All-in-One Degradation-Aware Fusion Models (ADFMs), a class of multi-modal image fusion models, address complex scenes by mitigating degradations from source images and generating high-quality fused images. Mainstream ADFMs often rely on highly synthetic multi-modal multi-quality images for supervision, limiting their effectiveness in cross-modal and rare degradation scenarios. The inherent relationship among these multi-modal, multi-quality images of the same scene provides explicit supervision for training, but also raises above problems. To address these limitations, we present LURE, a Learning-driven Unified Representation model for infrared and visible Image Fusion, which is degradation-aware. LURE decouples multi-modal multi-quality data at the data level and recouples this relationship in a unified latent feature space (ULFS) by proposing a novel unified loss. This decoupling circumvents data-level limitations of prior models and allows leveraging real-world restoration datasets for training high-quality degradation-aware models, sidestepping above issues. To enhance text-image interaction, we refine image-text interaction and residual structures via Text-Guided Attention (TGA) and an inner residual structure. These enhances text's spatial perception of images and preserve more visual details. Experiments show our method outperforms state-of-the-art (SOTA) methods across general fusion, degradation-aware fusion, and downstream tasks. The code will be publicly available.
Problem

Research questions and friction points this paper is trying to address.

Addresses limitations in multi-modal image fusion models
Proposes a degradation-aware model for infrared and visible image fusion
Enhances text-image interaction and preserves visual details
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decouples multi-modal data at data level
Recouples data in unified latent feature space
Enhances text-image interaction via Text-Guided Attention
🔎 Similar Papers
No similar papers found.