CheXGenBench: A Unified Benchmark For Fidelity, Privacy and Utility of Synthetic Chest Radiographs

๐Ÿ“… 2025-05-15
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This paper addresses fundamental challenges in evaluating synthetic chest X-ray imagesโ€”namely, the fragmentation among fidelity, privacy risk, and clinical utility; inconsistent evaluation standards; and outdated model coverage. To this end, we introduce the first unified benchmark for medical image synthesis evaluation. Our method proposes a multidimensional quantitative protocol integrating authenticity, privacy preservation, and clinical utility, covering 11 state-of-the-art text-to-image models, standardized data splits, and over 20 reproducible metrics. It innovatively unifies advanced generative models (e.g., Sana), privacy leakage detection algorithms, radiographic quality assessment, and downstream lesion classification validation. Key contributions include: (1) establishing a new paradigm for evaluating medical image generation; (2) releasing SynthCheX-75K, a high-fidelity synthetic chest X-ray dataset; (3) exposing critical flaws in existing evaluation methodologies; and (4) substantially improving consistency and clinical relevance in cross-model comparisons.

Technology Category

Application Category

๐Ÿ“ Abstract
We introduce CheXGenBench, a rigorous and multifaceted evaluation framework for synthetic chest radiograph generation that simultaneously assesses fidelity, privacy risks, and clinical utility across state-of-the-art text-to-image generative models. Despite rapid advancements in generative AI for real-world imagery, medical domain evaluations have been hindered by methodological inconsistencies, outdated architectural comparisons, and disconnected assessment criteria that rarely address the practical clinical value of synthetic samples. CheXGenBench overcomes these limitations through standardised data partitioning and a unified evaluation protocol comprising over 20 quantitative metrics that systematically analyse generation quality, potential privacy vulnerabilities, and downstream clinical applicability across 11 leading text-to-image architectures. Our results reveal critical inefficiencies in the existing evaluation protocols, particularly in assessing generative fidelity, leading to inconsistent and uninformative comparisons. Our framework establishes a standardised benchmark for the medical AI community, enabling objective and reproducible comparisons while facilitating seamless integration of both existing and future generative models. Additionally, we release a high-quality, synthetic dataset, SynthCheX-75K, comprising 75K radiographs generated by the top-performing model (Sana 0.6B) in our benchmark to support further research in this critical domain. Through CheXGenBench, we establish a new state-of-the-art and release our framework, models, and SynthCheX-75K dataset at https://raman1121.github.io/CheXGenBench/
Problem

Research questions and friction points this paper is trying to address.

Evaluates synthetic chest radiograph fidelity, privacy, and clinical utility
Addresses inconsistent medical AI evaluation methods and criteria
Standardizes benchmarks for generative model comparisons in healthcare
Innovation

Methods, ideas, or system contributions that make the work stand out.

Standardized evaluation with 20+ metrics
Unified protocol for 11 text-to-image models
High-quality synthetic dataset SynthCheX-75K
๐Ÿ”Ž Similar Papers
No similar papers found.
Raman Dutt
Raman Dutt
University of Edinburgh
Medical Image AnalysisDeep LearningParameter-Efficient Fine-Tuning
P
Pedro Sanchez
Sinkove
Y
Yongchen Yao
The University of Edinburgh
Steven McDonagh
Steven McDonagh
Senior Lecturer, University of Edinburgh
Artificial IntelligenceComputer VisionBiomedical Imaging
S
S. Tsaftaris
The University of Edinburgh
T
Timothy Hospedales
The University of Edinburgh, Samsung AI Center, Cambridge