🤖 AI Summary
High construction and annotation costs for 3D medical imaging datasets hinder scalable model development. To address this, we propose GuideGen—the first framework enabling joint generation of full-torso (thorax-to-pelvis) anatomical masks and high-fidelity CT volumes conditioned on free-text prompts. Methodologically, GuideGen integrates a text-conditioned diffusion model, a contrast-aware autoencoder, and a latent feature alignment generator to establish a semantic–contrast–text triple-alignment mechanism. We introduce the first cross-modal cancer CT–text paired benchmark dataset and adopt a hybrid training strategy leveraging both TCIA and private clinical data. Experiments demonstrate that GuideGen significantly outperforms state-of-the-art methods in generation fidelity, cross-modal alignment accuracy, and downstream multi-organ/tumor segmentation performance. It markedly improves controllability and clinical utility of synthetic data, enabling precise, anatomy-aware, text-driven CT synthesis.
📝 Abstract
The recently emerging conditional diffusion models seem promising for mitigating the labor and expenses in building large 3D medical imaging datasets. However, previous studies on 3D CT generation have yet to fully capitalize on semantic and textual conditions, and they have primarily focused on specific organs characterized by a local structure and fixed contrast. In this work, we present GuideGen, a controllable framework that generates anatomical masks and corresponding CT volumes for the entire torso-from chest to pelvis-based on free-form text prompts. Our approach includes three core components: a text-conditional semantic synthesizer for creating realistic full-torso anatomies; a contrast-aware autoencoder for detailed, high-fidelity feature extraction across varying contrast levels; and a latent feature generator that ensures alignment between CT images, anatomical semantics and input prompts. To train and evaluate GuideGen, we compile a multi-modality cancer imaging dataset with paired CT and clinical descriptions from 12 public TCIA datasets and one private real-world dataset. Comprehensive evaluations across generation quality, cross-modality alignment, and data usability on multi-organ and tumor segmentation tasks demonstrate GuideGen's superiority over existing CT generation methods.