A Generative Foundation Model for Chest Radiography

📅 2025-09-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Medical imaging annotation data are scarce and lack demographic diversity, undermining AI model reliability and fairness. To address this, we propose ChexGen—the first generative vision-language foundation model for chest X-ray analysis—built upon a latent diffusion Transformer architecture and pre-trained on 960K X-ray–report pairs. It enables fine-grained image synthesis conditioned on text, segmentation masks, and bounding boxes, ensuring semantic alignment between generated images and clinical reports. Our key contribution is a unified generative framework that synthesizes diverse, demographically representative patient cohorts while explicitly detecting and mitigating dataset biases. Experiments demonstrate that radiologists rate the generated images highly; quantitative metrics confirm high fidelity and clinical plausibility. In few-shot settings, ChexGen-augmented data significantly improve performance across classification, detection, and segmentation tasks—validating its effectiveness, generalizability, and capacity to enhance fairness in diagnostic AI.

Technology Category

Application Category

📝 Abstract
The scarcity of well-annotated diverse medical images is a major hurdle for developing reliable AI models in healthcare. Substantial technical advances have been made in generative foundation models for natural images. Here we develop `ChexGen', a generative vision-language foundation model that introduces a unified framework for text-, mask-, and bounding box-guided synthesis of chest radiographs. Built upon the latent diffusion transformer architecture, ChexGen was pretrained on the largest curated chest X-ray dataset to date, consisting of 960,000 radiograph-report pairs. ChexGen achieves accurate synthesis of radiographs through expert evaluations and quantitative metrics. We demonstrate the utility of ChexGen for training data augmentation and supervised pretraining, which led to performance improvements across disease classification, detection, and segmentation tasks using a small fraction of training data. Further, our model enables the creation of diverse patient cohorts that enhance model fairness by detecting and mitigating demographic biases. Our study supports the transformative role of generative foundation models in building more accurate, data-efficient, and equitable medical AI systems.
Problem

Research questions and friction points this paper is trying to address.

Addressing scarcity of annotated medical images for AI
Developing generative model for chest radiograph synthesis
Enhancing medical AI accuracy and fairness through augmentation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative vision-language model for chest radiographs
Latent diffusion transformer architecture for synthesis
Training data augmentation and bias mitigation
Yuanfeng Ji
Yuanfeng Ji
Stanford; HKU
Computer visionMedical Image Analysis
Dan Lin
Dan Lin
NANYANG TECHNOLOGICAL UNIVERSITY (NTU)
Data Mining and Machine LearningComputer Vision
X
Xiyue Wang
Department of Radiation Oncology, Stanford University School of Medicine, Stanford, CA USA
L
Lu Zhang
Department of Radiology, The First Affiliated Hospital of Jinan University, Guangzhou, China
Wenhui Zhou
Wenhui Zhou
Department of Radiology, Stanford University School of Medicine, Stanford, CA USA
C
Chongjian Ge
Department of Computer Science, School of Computing and Data Science, The University of Hong Kong, Hong Kong SAR, China
Ruihang Chu
Ruihang Chu
Tsinghua University, CUHK, Wan
Generative AIVision-Language ModelComputer Vision
Xiaoli Yang
Xiaoli Yang
Department of Radiation Oncology, Stanford University School of Medicine, Stanford, CA USA
J
Junhan Zhao
Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
Junsong Chen
Junsong Chen
NVIDIA Research Intern, MMLab@HKU
Generative Model, Large Language Model
Xiangde Luo
Xiangde Luo
Stanford University
medical image analysiscomputer visioncomputational pathology
S
Sen Yang
Department of Radiation Oncology, Stanford University School of Medicine, Stanford, CA USA
J
Jin Fang
Department of Radiology, The First Affiliated Hospital of Jinan University, Guangzhou, China
Ping Luo
Ping Luo
National University of Defense Technology
distributed_computing
Ruijiang Li
Ruijiang Li
Stanford University
biomarkercancer imagingRadiomicsmachine learning