Self-supervised Multiplex Consensus Mamba for General Image Fusion

📅 2025-12-23

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

This work addresses general-purpose image fusion—specifically, the unsupervised, label-free integration of cross-modal images (e.g., infrared–visible, medical, multi-focus, multi-exposure) to enhance downstream detection and segmentation performance without modality-specific priors. Method: We propose the first fusion-oriented self-supervised Multi-Path Consensus Mamba architecture. It incorporates a modality-agnostic feature enhancement module and a novel Bi-level Self-supervised Contrastive Learning (BSCL) loss. Leveraging spatial-channel-frequency domain rotational scanning and a multi-expert dynamic consensus mechanism, it preserves high-frequency details with zero computational overhead. Results: Extensive experiments demonstrate state-of-the-art performance across four fundamental fusion tasks and their corresponding downstream vision tasks, validating the method’s strong generalization, computational efficiency, and practical applicability.

Technology Category

Application Category

📝 Abstract

Image fusion integrates complementary information from different modalities to generate high-quality fused images, thereby enhancing downstream tasks such as object detection and semantic segmentation. Unlike task-specific techniques that primarily focus on consolidating inter-modal information, general image fusion needs to address a wide range of tasks while improving performance without increasing complexity. To achieve this, we propose SMC-Mamba, a Self-supervised Multiplex Consensus Mamba framework for general image fusion. Specifically, the Modality-Agnostic Feature Enhancement (MAFE) module preserves fine details through adaptive gating and enhances global representations via spatial-channel and frequency-rotational scanning. The Multiplex Consensus Cross-modal Mamba (MCCM) module enables dynamic collaboration among experts, reaching a consensus to efficiently integrate complementary information from multiple modalities. The cross-modal scanning within MCCM further strengthens feature interactions across modalities, facilitating seamless integration of critical information from both sources. Additionally, we introduce a Bi-level Self-supervised Contrastive Learning Loss (BSCL), which preserves high-frequency information without increasing computational overhead while simultaneously boosting performance in downstream tasks. Extensive experiments demonstrate that our approach outperforms state-of-the-art (SOTA) image fusion algorithms in tasks such as infrared-visible, medical, multi-focus, and multi-exposure fusion, as well as downstream visual tasks.

Problem

Research questions and friction points this paper is trying to address.

Develops a self-supervised framework for general image fusion across multiple tasks

Enhances feature integration from different modalities using dynamic expert collaboration

Improves fusion quality and downstream task performance without added complexity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised Mamba framework for general image fusion

Modality-Agnostic Feature Enhancement with adaptive gating and scanning

Multiplex Consensus Cross-modal Mamba for dynamic expert collaboration

🔎 Similar Papers

No similar papers found.