FusionMamba: Dynamic Feature Enhancement for Multimodal Image Fusion with Mamba

📅 2024-04-15
🏛️ Visual Intelligence
📈 Citations: 10
Influential: 1
📄 PDF
🤖 AI Summary
Existing CNNs suffer from limited global modeling capacity, while ViTs incur prohibitive computational overhead—both leading to incomplete information preservation and loss of fine details in multimodal image fusion. To address these limitations, this paper proposes a dynamic feature enhancement fusion framework built upon the Mamba architecture. Our key contributions are: (1) the first visual state-space model integrating dynamic convolution with channel-wise attention; (2) a Dynamic Feature Fusion Module (DFFM) that jointly enhances texture, perceives disparity, and models cross-modal correlations; and (3) a Cross-Modal Fusion Mamba module (CMFM) to improve inter-modal interaction efficiency. Evaluated on infrared–visible image fusion and other multimodal tasks, our method achieves state-of-the-art performance, significantly improving detail richness and structural fidelity of fused images, while also boosting downstream recognition accuracy.

Technology Category

Application Category

📝 Abstract
Multimodal image fusion aims to integrate information from different imaging techniques to produce a comprehensive, detail-rich single image for downstream vision tasks. Existing methods based on local convolutional neural networks (CNNs) struggle to capture global features efficiently, while Transformer-based models are computationally expensive, although they excel at global modeling. Mamba addresses these limitations by leveraging selective structured state space models (S4) to effectively handle long-range dependencies while maintaining linear complexity. In this paper, we propose FusionMamba, a novel dynamic feature enhancement framework that aims to overcome the challenges faced by CNNs and Vision Transformers (ViTs) in computer vision tasks. The framework improves the visual state-space model Mamba by integrating dynamic convolution and channel attention mechanisms, which not only retains its powerful global feature modeling capability, but also greatly reduces redundancy and enhances the expressiveness of local features. In addition, we have developed a new module called the dynamic feature fusion module (DFFM). It combines the dynamic feature enhancement module (DFEM) for texture enhancement and disparity perception with the cross-modal fusion Mamba module (CMFM), which focuses on enhancing the inter-modal correlation while suppressing redundant information. Experiments show that FusionMamba achieves state-of-the-art performance in a variety of multimodal image fusion tasks as well as downstream experiments, demonstrating its broad applicability and superiority.
Problem

Research questions and friction points this paper is trying to address.

Image Fusion
Global Information Processing
Multimodal Imaging
Innovation

Methods, ideas, or system contributions that make the work stand out.

FusionMamba
Dynamic Feature Fusion Module (DFFM)
Multi-modal Image Fusion
🔎 Similar Papers
No similar papers found.