Improving Performance, Robustness, and Fairness of Radiographic AI Models with Finely-Controllable Synthetic Data

📅 2025-08-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
AI models for radiographic imaging face challenges in cross-population generalizability, robustness, and fairness. Method: We propose a fine-grained, demographically controllable synthetic data framework, introducing RoentGen-v2—the first text-to-image diffusion model jointly conditioned on sex, age, and race—to generate 565,000 clinically plausible chest X-ray images. We adopt a novel “synthetic-data-supervised pretraining + real-data fine-tuning” paradigm and conduct multicenter external validation across five institutions using 137,000 real-world images. Contribution/Results: Our approach improves downstream disease classification accuracy by 6.5% over baseline models—nearly doubling the gain achieved by conventional mixed-data methods—and reduces inter-group disparities in misdiagnosis rates by 19.3%, significantly enhancing model fairness and clinical deployability.

Technology Category

Application Category

📝 Abstract
Achieving robust performance and fairness across diverse patient populations remains a challenge in developing clinically deployable deep learning models for diagnostic imaging. Synthetic data generation has emerged as a promising strategy to address limitations in dataset scale and diversity. We introduce RoentGen-v2, a text-to-image diffusion model for chest radiographs that enables fine-grained control over both radiographic findings and patient demographic attributes, including sex, age, and race/ethnicity. RoentGen-v2 is the first model to generate clinically plausible images with demographic conditioning, facilitating the creation of a large, demographically balanced synthetic dataset comprising over 565,000 images. We use this large synthetic dataset to evaluate optimal training pipelines for downstream disease classification models. In contrast to prior work that combines real and synthetic data naively, we propose an improved training strategy that leverages synthetic data for supervised pretraining, followed by fine-tuning on real data. Through extensive evaluation on over 137,000 chest radiographs from five institutions, we demonstrate that synthetic pretraining consistently improves model performance, generalization to out-of-distribution settings, and fairness across demographic subgroups. Across datasets, synthetic pretraining led to a 6.5% accuracy increase in the performance of downstream classification models, compared to a modest 2.7% increase when naively combining real and synthetic data. We observe this performance improvement simultaneously with the reduction of the underdiagnosis fairness gap by 19.3%. These results highlight the potential of synthetic imaging to advance equitable and generalizable medical deep learning under real-world data constraints. We open source our code, trained models, and synthetic dataset at https://github.com/StanfordMIMI/RoentGen-v2 .
Problem

Research questions and friction points this paper is trying to address.

Enhancing AI model robustness and fairness in medical imaging
Addressing dataset limitations with controllable synthetic data generation
Improving diagnostic accuracy and reducing demographic bias in radiology
Innovation

Methods, ideas, or system contributions that make the work stand out.

Text-to-image diffusion model for chest radiographs
Fine-grained control over findings and demographics
Synthetic pretraining followed by real data fine-tuning
🔎 Similar Papers
No similar papers found.
S
Stefania L. Moroianu
Center for Artificial Intelligence in Medicine and Imaging, Stanford University; Department of Applied Physics, Stanford University
Christian Bluethgen
Christian Bluethgen
Radiologist, Clinician Scientist, USZ Zurich, AIMI Center, Stanford University
RadiologyThoracic ImagingMultimodal Machine Learning
Pierre Chambon
Pierre Chambon
FAIR, META
Natural Language Processing
Mehdi Cherti
Mehdi Cherti
Postdoc at Forschungszentrum Jülich, LAION co-founder
Deep learningScaling lawsmulti-modal models
Jean-Benoit Delbrouck
Jean-Benoit Delbrouck
Hugging Face, Stanford
Magdalini Paschali
Magdalini Paschali
Postdoctoral Scholar, Stanford University
Deep LearningComputer VisionMedical Imaging
B
Brandon Price
Department of Radiology & Imaging Sciences, Emory University; Department of Radiology, University of Florida College of Medicine
J
Judy Gichoya
Department of Radiology & Imaging Sciences, Emory University
Jenia Jitsev
Jenia Jitsev
Scalable Learning & Multi-Purpose AI (SLAMPAI) Lab, JSC, Forschungszentrum Juelich; ELLIS; LAION
Open Foundation Models & DatasetsScaling lawsPlasticity and Learning in Neural Networks
Curtis P. Langlotz
Curtis P. Langlotz
Professor of Radiology, Medicine, and Biomedical Data Science, Stanford University
machine learningcomputer visionnatural language processingdecision support systemstechnology assessment
A
Akshay S. Chaudhari
Center for Artificial Intelligence in Medicine and Imaging, Stanford University; Department of Radiology, Stanford University; Department of Biomedical Data Science, Stanford University