🤖 AI Summary
To address the scarcity of medical imaging data, this study systematically investigates high-fidelity synthetic chest X-ray generation using latent diffusion models (LDMs). We propose two novel guidance strategies: (1) single-disease label–conditioned generation and (2) segmentation mask–guided generation enhanced by geometric transformations. Additionally, we introduce a radiologist-in-the-loop feedback mechanism and proxy model fine-tuning to significantly improve clinical credibility and task-specific utility of synthetic data. Extensive evaluation across multiple datasets—including CheXpert—demonstrates statistically significant improvements: classification F1-score mean increases by up to 0.1505 (*P* = 0.0031), and segmentation Dice score mean rises by up to 0.1458 (*P* = 0.0064), both surviving Bonferroni correction and exhibiting cross-dataset robustness. To our knowledge, this work establishes the first clinical需求–driven framework that jointly optimizes synthetic data quality and downstream task performance.
📝 Abstract
Purpose: To explore best-practice approaches for generating synthetic chest X-ray images and augmenting medical imaging datasets to optimize the performance of deep learning models in downstream tasks like classification and segmentation. Materials and Methods: We utilized a latent diffusion model to condition the generation of synthetic chest X-rays on text prompts and/or segmentation masks. We explored methods like using a proxy model and using radiologist feedback to improve the quality of synthetic data. These synthetic images were then generated from relevant disease information or geometrically transformed segmentation masks and added to ground truth training set images from the CheXpert, CANDID-PTX, SIIM, and RSNA Pneumonia datasets to measure improvements in classification and segmentation model performance on the test sets. F1 and Dice scores were used to evaluate classification and segmentation respectively. One-tailed t-tests with Bonferroni correction assessed the statistical significance of performance improvements with synthetic data. Results: Across all experiments, the synthetic data we generated resulted in a maximum mean classification F1 score improvement of 0.150453 (CI: 0.099108-0.201798; P=0.0031) compared to using only real data. For segmentation, the maximum Dice score improvement was 0.14575 (CI: 0.108267-0.183233; P=0.0064). Conclusion: Best practices for generating synthetic chest X-ray images for downstream tasks include conditioning on single-disease labels or geometrically transformed segmentation masks, as well as potentially using proxy modeling for fine-tuning such generations.