FlexiD-Fuse: Flexible number of inputs multi-modal medical image fusion based on diffusion model

📅 2025-09-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing medical image fusion methods are constrained by fixed-input modality counts, limiting adaptability to diverse clinical modality combinations. To address this, we propose the first end-to-end diffusion-based framework supporting arbitrary numbers of input modalities. Our method innovatively couples hierarchical Bayesian modeling with the diffusion process and embeds an Expectation-Maximization (EM) algorithm into the sampling stage for maximum likelihood estimation. Key components include modality-adaptive alignment, variable-length feature fusion, and uncertainty-aware reconstruction. On the Harvard multimodal dataset, our approach achieves state-of-the-art performance across all nine quantitative metrics for both two- and three-modality fusion tasks. Furthermore, cross-domain generalization experiments—spanning infrared–visible light, multi-exposure, and multi-focus imaging—demonstrate significant improvements over prior art. This work establishes a new paradigm for flexible, robust, and clinically deployable multimodal image fusion.

Technology Category

Application Category

📝 Abstract
Different modalities of medical images provide unique physiological and anatomical information for diseases. Multi-modal medical image fusion integrates useful information from different complementary medical images with different modalities, producing a fused image that comprehensively and objectively reflects lesion characteristics to assist doctors in clinical diagnosis. However, existing fusion methods can only handle a fixed number of modality inputs, such as accepting only two-modal or tri-modal inputs, and cannot directly process varying input quantities, which hinders their application in clinical settings. To tackle this issue, we introduce FlexiD-Fuse, a diffusion-based image fusion network designed to accommodate flexible quantities of input modalities. It can end-to-end process two-modal and tri-modal medical image fusion under the same weight. FlexiD-Fuse transforms the diffusion fusion problem, which supports only fixed-condition inputs, into a maximum likelihood estimation problem based on the diffusion process and hierarchical Bayesian modeling. By incorporating the Expectation-Maximization algorithm into the diffusion sampling iteration process, FlexiD-Fuse can generate high-quality fused images with cross-modal information from source images, independently of the number of input images. We compared the latest two and tri-modal medical image fusion methods, tested them on Harvard datasets, and evaluated them using nine popular metrics. The experimental results show that our method achieves the best performance in medical image fusion with varying inputs. Meanwhile, we conducted extensive extension experiments on infrared-visible, multi-exposure, and multi-focus image fusion tasks with arbitrary numbers, and compared them with the perspective SOTA methods. The results of the extension experiments consistently demonstrate the effectiveness and superiority of our method.
Problem

Research questions and friction points this paper is trying to address.

Handles flexible input quantities for medical image fusion
Overcomes fixed modality limitations in existing fusion methods
Supports varying numbers of input images end-to-end
Innovation

Methods, ideas, or system contributions that make the work stand out.

Flexible input quantities via diffusion model
Maximum likelihood estimation with Bayesian modeling
Expectation-Maximization in diffusion sampling iteration
🔎 Similar Papers
2024-07-06IEEE journal of biomedical and health informaticsCitations: 1
Yushen Xu
Yushen Xu
Foshan University
Image Fusion,Computer Vision
X
Xiaosong Li
School of Physics and Optoelectronic Engineering, Foshan University, Foshan 528225, China; Guangdong-HongKong-Macao Joint Laboratory for Intelligent Micro-Nano Optoelectronic Technology, Foshan 528225, China; Guangdong Provincial Key Laboratory of Industrial Intelligent Inspection Technology, Foshan University, Foshan 528000, China
Y
Yuchun Wang
School of Physics and Optoelectronic Engineering, Foshan University, Foshan 528225, China
X
Xiaoqi Cheng
Guangdong Provincial Key Laboratory of Industrial Intelligent Inspection Technology, Foshan University, Foshan 528000, China
Huafeng Li
Huafeng Li
KUST
Computer VisionPattern RecognitionMachine Learning
H
Haishu Tan
School of Physics and Optoelectronic Engineering, Foshan University, Foshan 528225, China