FlexiD-Fuse: Flexible number of inputs multi-modal medical image fusion based on diffusion model

📅 2025-09-11

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Existing medical image fusion methods are constrained by fixed-input modality counts, limiting adaptability to diverse clinical modality combinations. To address this, we propose the first end-to-end diffusion-based framework supporting arbitrary numbers of input modalities. Our method innovatively couples hierarchical Bayesian modeling with the diffusion process and embeds an Expectation-Maximization (EM) algorithm into the sampling stage for maximum likelihood estimation. Key components include modality-adaptive alignment, variable-length feature fusion, and uncertainty-aware reconstruction. On the Harvard multimodal dataset, our approach achieves state-of-the-art performance across all nine quantitative metrics for both two- and three-modality fusion tasks. Furthermore, cross-domain generalization experiments—spanning infrared–visible light, multi-exposure, and multi-focus imaging—demonstrate significant improvements over prior art. This work establishes a new paradigm for flexible, robust, and clinically deployable multimodal image fusion.

Technology Category

Application Category

📝 Abstract

Different modalities of medical images provide unique physiological and anatomical information for diseases. Multi-modal medical image fusion integrates useful information from different complementary medical images with different modalities, producing a fused image that comprehensively and objectively reflects lesion characteristics to assist doctors in clinical diagnosis. However, existing fusion methods can only handle a fixed number of modality inputs, such as accepting only two-modal or tri-modal inputs, and cannot directly process varying input quantities, which hinders their application in clinical settings. To tackle this issue, we introduce FlexiD-Fuse, a diffusion-based image fusion network designed to accommodate flexible quantities of input modalities. It can end-to-end process two-modal and tri-modal medical image fusion under the same weight. FlexiD-Fuse transforms the diffusion fusion problem, which supports only fixed-condition inputs, into a maximum likelihood estimation problem based on the diffusion process and hierarchical Bayesian modeling. By incorporating the Expectation-Maximization algorithm into the diffusion sampling iteration process, FlexiD-Fuse can generate high-quality fused images with cross-modal information from source images, independently of the number of input images. We compared the latest two and tri-modal medical image fusion methods, tested them on Harvard datasets, and evaluated them using nine popular metrics. The experimental results show that our method achieves the best performance in medical image fusion with varying inputs. Meanwhile, we conducted extensive extension experiments on infrared-visible, multi-exposure, and multi-focus image fusion tasks with arbitrary numbers, and compared them with the perspective SOTA methods. The results of the extension experiments consistently demonstrate the effectiveness and superiority of our method.

Problem

Research questions and friction points this paper is trying to address.

Handles flexible input quantities for medical image fusion

Overcomes fixed modality limitations in existing fusion methods

Supports varying numbers of input images end-to-end

Innovation

Methods, ideas, or system contributions that make the work stand out.

Flexible input quantities via diffusion model

Maximum likelihood estimation with Bayesian modeling

Expectation-Maximization in diffusion sampling iteration

🔎 Similar Papers

Completed Feature Disentanglement Learning for Multimodal MRIs Analysis