Recognizing Pneumonia in Real-World Chest X-rays with a Classifier Trained with Images Synthetically Generated by Nano Banana

📅 2025-11-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the feasibility of fully synthetic data–driven automatic pneumonia identification. We present the first application of Google’s Nano Banana generative model to synthesize high-fidelity chest X-ray images, enabling end-to-end classifier training without any real annotated data. Methodologically, our approach integrates synthetic image generation, deep learning classifier design, and a post-hoc domain alignment strategy that bridges synthetic and real-data distributions. Evaluated on two real-world clinical datasets—RSNA Pneumonia Detection Challenge 2018 and ChestX-Ray14—the model achieves AUROC of 0.923 (AUPR: 0.900) and AUROC of 0.824 (AUPR: 0.913), respectively—performance approaching clinical utility thresholds. This work demonstrates, for the first time, that state-of-the-art generative AI can enable robust diagnostic model training using synthetic medical imagery alone, offering a novel paradigm to address critical challenges of medical data scarcity and privacy constraints.

Technology Category

Application Category

📝 Abstract
We trained a classifier with synthetic chest X-ray (CXR) images generated by Nano Banana, the latest AI model for image generation and editing, released by Google. When directly applied to real-world CXRs having only been trained with synthetic data, the classifier achieved an AUROC of 0.923 (95% CI: 0.919 - 0.927), and an AUPR of 0.900 (95% CI: 0.894 - 0.907) in recognizing pneumonia in the 2018 RSNA Pneumonia Detection dataset (14,863 CXRs), and an AUROC of 0.824 (95% CI: 0.810 - 0.836), and an AUPR of 0.913 (95% CI: 0.904 - 0.922) in the Chest X-Ray dataset (5,856 CXRs). These external validation results on real-world data demonstrate the feasibility of this approach and suggest potential for synthetic data in medical AI development. Nonetheless, several limitations remain at present, including challenges in prompt design for controlling the diversity of synthetic CXR data and the requirement for post-processing to ensure alignment with real-world data. However, the growing sophistication and accessibility of medical intelligence will necessitate substantial validation, regulatory approval, and ethical oversight prior to clinical translation.
Problem

Research questions and friction points this paper is trying to address.

Training a pneumonia classifier using synthetic chest X-ray images
Validating the classifier's performance on real-world chest X-ray datasets
Addressing limitations in synthetic data diversity and alignment with real data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using synthetic chest X-ray images from Nano Banana AI
Training classifier solely on synthetic data for pneumonia detection
Achieving high AUROC and AUPR in external real-world validation
🔎 Similar Papers
No similar papers found.