TriFusion-SR: Joint Tri-Modal Medical Image Fusion and SR

📅 2026-03-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes TriFusion-SR, a wavelet-guided conditional diffusion framework that jointly models tri-modal medical image fusion and super-resolution—addressing the artifacts and perceptual degradation commonly caused by sequential processing, particularly in MRI/CT/PET scenarios where frequency-domain imbalance is pronounced. By explicitly decomposing frequency bands via 2D discrete wavelet transform, the method introduces Rectified Wavelet Features (RWF) for frequency-domain calibration and incorporates an Adaptive Spatial-Frequency Fusion (ASFF) module with gated channel-spatial attention to enhance structure-aware cross-modal interaction. Extensive experiments demonstrate that TriFusion-SR significantly outperforms existing approaches across multiple upscaling factors, achieving PSNR gains of 4.8–12.4% and substantial reductions in RMSE and LPIPS, thereby markedly improving both fidelity and visual quality of the fused images.

Technology Category

Application Category

📝 Abstract
Multimodal medical image fusion facilitates comprehensive diagnosis by aggregating complementary structural and functional information, but its effectiveness is limited by resolution degradation and modality discrepancies. Existing approaches typically perform image fusion and super-resolution (SR) in separate stages, leading to artifacts and degraded perceptual quality. These limitations are further amplified in tri-modal settings that combine anatomical modalities (e.g., MRI, CT) with functional scans (e.g., PET, SPECT) due to pronounced frequency domain imbalances. We propose TriFusionSR, a wavelet-guided conditional diffusion framework for joint tri-modal fusion and SR. The framework explicitly decomposes multimodal features into frequency bands using the 2D Discrete Wavelet Transform, enabling frequency-aware crossmodal interaction. We further introduce a Rectified Wavelet Features (RWF) strategy for latent coefficient calibration, followed by an Adaptive Spatial-Frequency Fusion (ASFF) module with gated channel-spatial attention to enable structure-driven multimodal refinement. Extensive experiments demonstrate state-of-the-art performance, achieving 4.8-12.4% PSNR improvement and substantial reductions in RMSE and LPIPS across multiple upsampling scales.
Problem

Research questions and friction points this paper is trying to address.

medical image fusion
super-resolution
tri-modal imaging
modality discrepancy
resolution degradation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Tri-modal fusion
Super-resolution
Wavelet-guided diffusion
Frequency-aware interaction
Adaptive Spatial-Frequency Fusion
🔎 Similar Papers
No similar papers found.
F
Fayaz Ali Dharejo
University of Würzburg, Germany
S
Sharif S. M. A.
Independent Researcher
A
Aiman Khalil
Mehran UET, Pakistan
N
Nachiket Chaudhary
University of Würzburg, Germany
R
Rizwan Ali Naqvi
Sejong University, Republic of Korea
Radu Timofte
Radu Timofte
Humboldt Professor for AI and Computer Vision, University of Würzburg
Computer VisionMachine LearningAICompressionComputational Photography