🤖 AI Summary
Medical imaging annotation data are scarce and lack demographic diversity, undermining AI model reliability and fairness. To address this, we propose ChexGen—the first generative vision-language foundation model for chest X-ray analysis—built upon a latent diffusion Transformer architecture and pre-trained on 960K X-ray–report pairs. It enables fine-grained image synthesis conditioned on text, segmentation masks, and bounding boxes, ensuring semantic alignment between generated images and clinical reports. Our key contribution is a unified generative framework that synthesizes diverse, demographically representative patient cohorts while explicitly detecting and mitigating dataset biases. Experiments demonstrate that radiologists rate the generated images highly; quantitative metrics confirm high fidelity and clinical plausibility. In few-shot settings, ChexGen-augmented data significantly improve performance across classification, detection, and segmentation tasks—validating its effectiveness, generalizability, and capacity to enhance fairness in diagnostic AI.
📝 Abstract
The scarcity of well-annotated diverse medical images is a major hurdle for developing reliable AI models in healthcare. Substantial technical advances have been made in generative foundation models for natural images. Here we develop `ChexGen', a generative vision-language foundation model that introduces a unified framework for text-, mask-, and bounding box-guided synthesis of chest radiographs. Built upon the latent diffusion transformer architecture, ChexGen was pretrained on the largest curated chest X-ray dataset to date, consisting of 960,000 radiograph-report pairs. ChexGen achieves accurate synthesis of radiographs through expert evaluations and quantitative metrics. We demonstrate the utility of ChexGen for training data augmentation and supervised pretraining, which led to performance improvements across disease classification, detection, and segmentation tasks using a small fraction of training data. Further, our model enables the creation of diverse patient cohorts that enhance model fairness by detecting and mitigating demographic biases. Our study supports the transformative role of generative foundation models in building more accurate, data-efficient, and equitable medical AI systems.