DM-FNet: Unified multimodal medical image fusion via diffusion process-trained encoder-decoder

📅 2025-06-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing multimodal medical image fusion (MMIF) methods struggle to fully capture fine-grained details and model cross-modal interactions, limiting fusion quality. To address this, we propose a two-stage diffusion-model-driven unified fusion network: Stage I employs a diffusion-pretrained UNet to extract multiscale detail features; Stage II enhances feature discriminability via noise-step-conditioned input and introduces a three-module adaptive fusion mechanism. This work is the first to systematically embed the diffusion process into the MMIF framework, coupled with a hybrid loss function jointly optimizing brightness, color, contrast, and detail fidelity. Extensive experiments across diverse medical imaging modalities demonstrate significant improvements in objective metrics (e.g., PSNR, SSIM). The fused images preserve clinically plausible brightness, complete tracer distribution, rich texture, and sharp edges—substantially enhancing information density and diagnostic utility.

Technology Category

Application Category

📝 Abstract
Multimodal medical image fusion (MMIF) extracts the most meaningful information from multiple source images, enabling a more comprehensive and accurate diagnosis. Achieving high-quality fusion results requires a careful balance of brightness, color, contrast, and detail; this ensures that the fused images effectively display relevant anatomical structures and reflect the functional status of the tissues. However, existing MMIF methods have limited capacity to capture detailed features during conventional training and suffer from insufficient cross-modal feature interaction, leading to suboptimal fused image quality. To address these issues, this study proposes a two-stage diffusion model-based fusion network (DM-FNet) to achieve unified MMIF. In Stage I, a diffusion process trains UNet for image reconstruction. UNet captures detailed information through progressive denoising and represents multilevel data, providing a rich set of feature representations for the subsequent fusion network. In Stage II, noisy images at various steps are input into the fusion network to enhance the model's feature recognition capability. Three key fusion modules are also integrated to process medical images from different modalities adaptively. Ultimately, the robust network structure and a hybrid loss function are integrated to harmonize the fused image's brightness, color, contrast, and detail, enhancing its quality and information density. The experimental results across various medical image types demonstrate that the proposed method performs exceptionally well regarding objective evaluation metrics. The fused image preserves appropriate brightness, a comprehensive distribution of radioactive tracers, rich textures, and clear edges. The code is available at https://github.com/HeDan-11/DM-FNet.
Problem

Research questions and friction points this paper is trying to address.

Improves multimodal medical image fusion quality
Enhances cross-modal feature interaction
Balances brightness, color, contrast, and detail
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage diffusion model for unified MMIF
UNet trained via diffusion for detailed reconstruction
Hybrid loss harmonizes brightness, color, contrast
🔎 Similar Papers
No similar papers found.
D
Dan He
School of Computer Science and Technology, Chongqing University of Posts and Telecommunications
Weisheng Li
Weisheng Li
Chongqing University of Posts and Telecommunications
图像处理、模式识别、机器学习、大数据、智能计算
G
Guofen Wang
College of Computer and Information Science, Chongqing Normal University
Yuping Huang
Yuping Huang
School of Computer Science and Technology, Chongqing University of Posts and Telecommunications
S
Shiqiang Liu
School of Computer Science and Technology, Chongqing University of Posts and Telecommunications