MAISI-v2: Accelerated 3D High-Resolution Medical Image Synthesis with Rectified Flow and Region-specific Contrastive Loss

📅 2025-08-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address three key challenges in medical image synthesis—poor generalization, slow inference, and weak conditional alignment—this paper proposes the first rectified flow-accelerated generative framework tailored for 3D high-resolution medical images. Methodologically, it integrates latent diffusion models with rectified flows to enable rapid sampling; introduces a region-specific contrastive loss to enhance anatomical fidelity in regions of interest; and employs a 3D convolutional backbone to improve spatial modeling capacity. Experiments demonstrate strong cross-modality and cross-anatomy generalization, state-of-the-art image quality, and a 33× speedup in inference over conventional latent diffusion models. Moreover, generated images significantly boost downstream segmentation performance, validating the framework’s practical utility for medical data augmentation.

Technology Category

Application Category

📝 Abstract
Medical image synthesis is an important topic for both clinical and research applications. Recently, diffusion models have become a leading approach in this area. Despite their strengths, many existing methods struggle with (1) limited generalizability that only work for specific body regions or voxel spacings, (2) slow inference, which is a common issue for diffusion models, and (3) weak alignment with input conditions, which is a critical issue for medical imaging. MAISI, a previously proposed framework, addresses generalizability issues but still suffers from slow inference and limited condition consistency. In this work, we present MAISI-v2, the first accelerated 3D medical image synthesis framework that integrates rectified flow to enable fast and high quality generation. To further enhance condition fidelity, we introduce a novel region-specific contrastive loss to enhance the sensitivity to region of interest. Our experiments show that MAISI-v2 can achieve SOTA image quality with $33 imes$ acceleration for latent diffusion model. We also conducted a downstream segmentation experiment to show that the synthetic images can be used for data augmentation. We release our code, training details, model weights, and a GUI demo to facilitate reproducibility and promote further development within the community.
Problem

Research questions and friction points this paper is trying to address.

Limited generalizability across body regions and voxel spacings
Slow inference speed in diffusion models
Weak alignment with input conditions in medical imaging
Innovation

Methods, ideas, or system contributions that make the work stand out.

Rectified flow for fast 3D image synthesis
Region-specific contrastive loss for condition fidelity
33x acceleration for latent diffusion model
🔎 Similar Papers
No similar papers found.