🤖 AI Summary
AI models for radiographic imaging face challenges in cross-population generalizability, robustness, and fairness. Method: We propose a fine-grained, demographically controllable synthetic data framework, introducing RoentGen-v2—the first text-to-image diffusion model jointly conditioned on sex, age, and race—to generate 565,000 clinically plausible chest X-ray images. We adopt a novel “synthetic-data-supervised pretraining + real-data fine-tuning” paradigm and conduct multicenter external validation across five institutions using 137,000 real-world images. Contribution/Results: Our approach improves downstream disease classification accuracy by 6.5% over baseline models—nearly doubling the gain achieved by conventional mixed-data methods—and reduces inter-group disparities in misdiagnosis rates by 19.3%, significantly enhancing model fairness and clinical deployability.
📝 Abstract
Achieving robust performance and fairness across diverse patient populations remains a challenge in developing clinically deployable deep learning models for diagnostic imaging. Synthetic data generation has emerged as a promising strategy to address limitations in dataset scale and diversity. We introduce RoentGen-v2, a text-to-image diffusion model for chest radiographs that enables fine-grained control over both radiographic findings and patient demographic attributes, including sex, age, and race/ethnicity. RoentGen-v2 is the first model to generate clinically plausible images with demographic conditioning, facilitating the creation of a large, demographically balanced synthetic dataset comprising over 565,000 images. We use this large synthetic dataset to evaluate optimal training pipelines for downstream disease classification models. In contrast to prior work that combines real and synthetic data naively, we propose an improved training strategy that leverages synthetic data for supervised pretraining, followed by fine-tuning on real data. Through extensive evaluation on over 137,000 chest radiographs from five institutions, we demonstrate that synthetic pretraining consistently improves model performance, generalization to out-of-distribution settings, and fairness across demographic subgroups. Across datasets, synthetic pretraining led to a 6.5% accuracy increase in the performance of downstream classification models, compared to a modest 2.7% increase when naively combining real and synthetic data. We observe this performance improvement simultaneously with the reduction of the underdiagnosis fairness gap by 19.3%. These results highlight the potential of synthetic imaging to advance equitable and generalizable medical deep learning under real-world data constraints. We open source our code, trained models, and synthetic dataset at https://github.com/StanfordMIMI/RoentGen-v2 .