TriFusion-SR: Joint Tri-Modal Medical Image Fusion and SR

📅 2026-03-10

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work proposes TriFusion-SR, a wavelet-guided conditional diffusion framework that jointly models tri-modal medical image fusion and super-resolution—addressing the artifacts and perceptual degradation commonly caused by sequential processing, particularly in MRI/CT/PET scenarios where frequency-domain imbalance is pronounced. By explicitly decomposing frequency bands via 2D discrete wavelet transform, the method introduces Rectified Wavelet Features (RWF) for frequency-domain calibration and incorporates an Adaptive Spatial-Frequency Fusion (ASFF) module with gated channel-spatial attention to enhance structure-aware cross-modal interaction. Extensive experiments demonstrate that TriFusion-SR significantly outperforms existing approaches across multiple upscaling factors, achieving PSNR gains of 4.8–12.4% and substantial reductions in RMSE and LPIPS, thereby markedly improving both fidelity and visual quality of the fused images.

Technology Category

Application Category

📝 Abstract

Multimodal medical image fusion facilitates comprehensive diagnosis by aggregating complementary structural and functional information, but its effectiveness is limited by resolution degradation and modality discrepancies. Existing approaches typically perform image fusion and super-resolution (SR) in separate stages, leading to artifacts and degraded perceptual quality. These limitations are further amplified in tri-modal settings that combine anatomical modalities (e.g., MRI, CT) with functional scans (e.g., PET, SPECT) due to pronounced frequency domain imbalances. We propose TriFusionSR, a wavelet-guided conditional diffusion framework for joint tri-modal fusion and SR. The framework explicitly decomposes multimodal features into frequency bands using the 2D Discrete Wavelet Transform, enabling frequency-aware crossmodal interaction. We further introduce a Rectified Wavelet Features (RWF) strategy for latent coefficient calibration, followed by an Adaptive Spatial-Frequency Fusion (ASFF) module with gated channel-spatial attention to enable structure-driven multimodal refinement. Extensive experiments demonstrate state-of-the-art performance, achieving 4.8-12.4% PSNR improvement and substantial reductions in RMSE and LPIPS across multiple upsampling scales.

Problem

Research questions and friction points this paper is trying to address.

medical image fusion

super-resolution

tri-modal imaging

modality discrepancy

resolution degradation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Tri-modal fusion

Super-resolution

Wavelet-guided diffusion

Frequency-aware interaction

Adaptive Spatial-Frequency Fusion

🔎 Similar Papers

No similar papers found.

Authors to Follow