🤖 AI Summary
Addressing the dual challenges of data scarcity and privacy preservation in medical imaging, this paper proposes Med-LSDM—the first latent-space diffusion model tailored for 3D semantic image synthesis. Med-LSDM conditions synthesis on de-identified semantic maps and operates within the compressed latent space of a VQ-GAN, enabling efficient, anatomy-preserving 3D generation. It pioneers the integration of latent semantic diffusion into 3D medical image synthesis, circumventing the prohibitive computational cost of pixel-level diffusion while maintaining critical anatomical fidelity. Evaluated on the Duke Breast dataset, Med-LSDM achieves a 3D-FID of 0.0054 and a Dice score of 0.7096—comparable to the ground-truth value of 0.7150—with negligible domain shift. These results demonstrate its effectiveness in enhancing data diversity and model generalizability under strict privacy constraints.
📝 Abstract
In the medical domain, acquiring large datasets is challenging due to both accessibility issues and stringent privacy regulations. Consequently, data availability and privacy protection are major obstacles to applying machine learning in medical imaging. To address this, our study proposes the Med-LSDM (Latent Semantic Diffusion Model), which operates directly in the 3D domain and leverages de-identified semantic maps to generate synthetic data as a method of privacy preservation and data augmentation. Unlike many existing methods that focus on generating 2D slices, Med-LSDM is designed specifically for 3D semantic image synthesis, making it well-suited for applications requiring full volumetric data. Med-LSDM incorporates a guiding mechanism that controls the 3D image generation process by applying a diffusion model within the latent space of a pre-trained VQ-GAN. By operating in the compressed latent space, the model significantly reduces computational complexity while still preserving critical 3D spatial details. Our approach demonstrates strong performance in 3D semantic medical image synthesis, achieving a 3D-FID score of 0.0054 on the conditional Duke Breast dataset and similar Dice scores (0.70964) to those of real images (0.71496). These results demonstrate that the synthetic data from our model have a small domain gap with real data and are useful for data augmentation.