A Generative Foundation Model for Chest Radiography

📅 2025-09-04

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

Medical imaging annotation data are scarce and lack demographic diversity, undermining AI model reliability and fairness. To address this, we propose ChexGen—the first generative vision-language foundation model for chest X-ray analysis—built upon a latent diffusion Transformer architecture and pre-trained on 960K X-ray–report pairs. It enables fine-grained image synthesis conditioned on text, segmentation masks, and bounding boxes, ensuring semantic alignment between generated images and clinical reports. Our key contribution is a unified generative framework that synthesizes diverse, demographically representative patient cohorts while explicitly detecting and mitigating dataset biases. Experiments demonstrate that radiologists rate the generated images highly; quantitative metrics confirm high fidelity and clinical plausibility. In few-shot settings, ChexGen-augmented data significantly improve performance across classification, detection, and segmentation tasks—validating its effectiveness, generalizability, and capacity to enhance fairness in diagnostic AI.

Technology Category

Application Category

📝 Abstract

The scarcity of well-annotated diverse medical images is a major hurdle for developing reliable AI models in healthcare. Substantial technical advances have been made in generative foundation models for natural images. Here we develop `ChexGen', a generative vision-language foundation model that introduces a unified framework for text-, mask-, and bounding box-guided synthesis of chest radiographs. Built upon the latent diffusion transformer architecture, ChexGen was pretrained on the largest curated chest X-ray dataset to date, consisting of 960,000 radiograph-report pairs. ChexGen achieves accurate synthesis of radiographs through expert evaluations and quantitative metrics. We demonstrate the utility of ChexGen for training data augmentation and supervised pretraining, which led to performance improvements across disease classification, detection, and segmentation tasks using a small fraction of training data. Further, our model enables the creation of diverse patient cohorts that enhance model fairness by detecting and mitigating demographic biases. Our study supports the transformative role of generative foundation models in building more accurate, data-efficient, and equitable medical AI systems.

Problem

Research questions and friction points this paper is trying to address.

Addressing scarcity of annotated medical images for AI

Developing generative model for chest radiograph synthesis

Enhancing medical AI accuracy and fairness through augmentation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative vision-language model for chest radiographs

Latent diffusion transformer architecture for synthesis

Training data augmentation and bias mitigation

🔎 Similar Papers

Towards Predicting Temporal Changes in a Patient's Chest X-ray Images based on Electronic Health Records