π€ AI Summary
Medical MRI segmentation is hindered by the scarcity of high-quality annotations, leading to severe performance degradation under few-shot training regimes. To address this, we propose ZeroFusionβa zero-shot fusion-driven 3D conditional generative framework that, for the first time, progressively models the cross-modal mapping from segmentation masks to MRI volumes in latent space. ZeroFusion incorporates a spatial transformation module and a 3D spatial attention mechanism to explicitly capture inter-slice dependencies in volumetric data, significantly enhancing generalization and robustness under extreme data scarcity. The framework synergistically integrates diffusion modeling, multimodal joint encoding, and a novel cross-modal alignment strategy to enable high-fidelity, synchronized synthesis of MRI volumes and their corresponding 3D segmentation masks. Evaluated on multiple brain MRI benchmarks, ZeroFusion achieves state-of-the-art performance across PSNR, SSIM, and Dice scores, with substantial improvements in both quantitative metrics and visual quality.
π Abstract
Medical image segmentation is crucial for enhancing diagnostic accuracy and treatment planning in Magnetic Resonance Imaging (MRI). However, acquiring precise lesion masks for segmentation model training demands specialized expertise and significant time investment, leading to a small dataset scale in clinical practice. In this paper, we present ZECO, a ZeroFusion guided 3D MRI conditional generation framework that extracts, compresses, and generates high-fidelity MRI images with corresponding 3D segmentation masks to mitigate data scarcity. To effectively capture inter-slice relationships within volumes, we introduce a Spatial Transformation Module that encodes MRI images into a compact latent space for the diffusion process. Moving beyond unconditional generation, our novel ZeroFusion method progressively maps 3D masks to MRI images in latent space, enabling robust training on limited datasets while avoiding overfitting. ZECO outperforms state-of-the-art models in both quantitative and qualitative evaluations on Brain MRI datasets across various modalities, showcasing its exceptional capability in synthesizing high-quality MRI images conditioned on segmentation masks.