🤖 AI Summary
Medical imaging analysis faces challenges including data scarcity, high annotation costs, and privacy sensitivity. To address these, we propose the first latent-space diffusion framework for 3D CT synthesis, integrating a Foundation Volume Compression Network with a ControlNet-based conditional control mechanism—enabling flexible voxel resolution and high-fidelity generation up to 512×512×768. Our method is the first to achieve precise anatomical control via segmentation maps of 127 organ classes. By incorporating multi-scale anatomical priors, the synthesized CT volumes demonstrate superior anatomical consistency, textural realism, and downstream task performance (e.g., segmentation and registration) compared to state-of-the-art methods. The framework significantly enhances few-shot model training and facilitates privacy-preserving medical AI development.
📝 Abstract
Medical imaging analysis faces challenges such as data scarcity, high annotation costs, and privacy concerns. This paper introduces the Medical AI for Synthetic Imaging (MAISI), an innovative approach using the diffusion model to generate synthetic 3D computed tomography (CT) images to address those challenges. MAISI leverages the foundation volume compression network and the latent diffusion model to produce high-resolution CT images (up to a landmark volume dimension of 512 × 512 × 768) with flexible volume dimensions and voxel spacing. By incorporating ControlNet, MAISI can process organ segmentation, including 127 anatomical structures, as additional conditions and enables the generation of accurately annotated synthetic images that can be used for various downstream tasks. Our experiment results show that MAISI's capabilities in generating realistic, anatomically accurate images for diverse regions and conditions reveal its promising potential to mitigate challenges using synthetic data.