GuideGen: A Text-Guided Framework for Full-torso Anatomy and CT Volume Generation

📅 2024-03-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
High construction and annotation costs for 3D medical imaging datasets hinder scalable model development. To address this, we propose GuideGen—the first framework enabling joint generation of full-torso (thorax-to-pelvis) anatomical masks and high-fidelity CT volumes conditioned on free-text prompts. Methodologically, GuideGen integrates a text-conditioned diffusion model, a contrast-aware autoencoder, and a latent feature alignment generator to establish a semantic–contrast–text triple-alignment mechanism. We introduce the first cross-modal cancer CT–text paired benchmark dataset and adopt a hybrid training strategy leveraging both TCIA and private clinical data. Experiments demonstrate that GuideGen significantly outperforms state-of-the-art methods in generation fidelity, cross-modal alignment accuracy, and downstream multi-organ/tumor segmentation performance. It markedly improves controllability and clinical utility of synthetic data, enabling precise, anatomy-aware, text-driven CT synthesis.

Technology Category

Application Category

📝 Abstract
The recently emerging conditional diffusion models seem promising for mitigating the labor and expenses in building large 3D medical imaging datasets. However, previous studies on 3D CT generation have yet to fully capitalize on semantic and textual conditions, and they have primarily focused on specific organs characterized by a local structure and fixed contrast. In this work, we present GuideGen, a controllable framework that generates anatomical masks and corresponding CT volumes for the entire torso-from chest to pelvis-based on free-form text prompts. Our approach includes three core components: a text-conditional semantic synthesizer for creating realistic full-torso anatomies; a contrast-aware autoencoder for detailed, high-fidelity feature extraction across varying contrast levels; and a latent feature generator that ensures alignment between CT images, anatomical semantics and input prompts. To train and evaluate GuideGen, we compile a multi-modality cancer imaging dataset with paired CT and clinical descriptions from 12 public TCIA datasets and one private real-world dataset. Comprehensive evaluations across generation quality, cross-modality alignment, and data usability on multi-organ and tumor segmentation tasks demonstrate GuideGen's superiority over existing CT generation methods.
Problem

Research questions and friction points this paper is trying to address.

Generates full-torso anatomical masks and CT volumes from text prompts
Addresses limitations of prior 3D CT generation focusing on specific organs
Enables data synthesis for segmentation tasks using textual instructions only
Innovation

Methods, ideas, or system contributions that make the work stand out.

Text-guided framework for full-torso anatomy and CT generation
Anatomy-aware HDR autoencoder for high-fidelity feature extraction
Latent feature generator aligning CT images with text prompts
🔎 Similar Papers
No similar papers found.