MaeFuse: Transferring Omni Features with Pretrained Masked Autoencoders for Infrared and Visible Image Fusion via Guided Training

📅 2024-04-17
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing infrared and visible image fusion methods suffer from reliance on downstream-task fine-tuning, severe cross-modal domain shift, and blocking artifacts introduced by MAE-based encoders. To address these issues, this paper proposes MaeFuse—a novel fusion framework that pioneers the integration of pre-trained Masked Autoencoders (MAEs) into image fusion. It freezes the MAE encoder to extract generic, task-agnostic cross-modal features, eliminating dependence on task-specific supervision. A dual-stream feature alignment module and a guided progressive fusion training strategy are jointly designed to mitigate both domain shift and blocking artifacts. Additionally, a multi-scale feature reconstruction loss is introduced to enhance detail fidelity. Evaluated on benchmark datasets including TNO and RoadScene, MaeFuse achieves state-of-the-art performance, significantly improving target conspicuity and texture preservation while delivering high visual quality and strong compatibility with downstream vision tasks.

Technology Category

Application Category

📝 Abstract
In this paper, we introduce MaeFuse, a novel autoencoder model designed for Infrared and Visible Image Fusion (IVIF). The existing approaches for image fusion often rely on training combined with downstream tasks to obtain highlevel visual information, which is effective in emphasizing target objects and delivering impressive results in visual quality and task-specific applications. Instead of being driven by downstream tasks, our model called MaeFuse utilizes a pretrained encoder from Masked Autoencoders (MAE), which facilities the omni features extraction for low-level reconstruction and high-level vision tasks, to obtain perception friendly features with a low cost. In order to eliminate the domain gap of different modal features and the block effect caused by the MAE encoder, we further develop a guided training strategy. This strategy is meticulously crafted to ensure that the fusion layer seamlessly adjusts to the feature space of the encoder, gradually enhancing the fusion performance. The proposed method can facilitate the comprehensive integration of feature vectors from both infrared and visible modalities, thus preserving the rich details inherent in each modal. MaeFuse not only introduces a novel perspective in the realm of fusion techniques but also stands out with impressive performance across various public datasets.
Problem

Research questions and friction points this paper is trying to address.

Infrared and Visible Image Fusion
Pretrained Masked Autoencoders
Guided Training Strategy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Pretrained Masked Autoencoders
Guided Training Strategy
Omni Features Extraction
🔎 Similar Papers
No similar papers found.