🤖 AI Summary
Chest X-ray synthesis faces challenges including heterogeneous lesion morphology, tight anatomical–pathological coupling, scarce expert annotations, and domain shift, hindering fine-grained controllable generation. To address this, we propose an anatomy-guided progressive conditional generation framework: first, leveraging clinical prompts and a pre-trained medical foundation model to generate anatomy-aware pseudo-semantic pathology masks; then, jointly synthesizing high-fidelity X-ray images conditioned on these masks. A multi-expert filtering module is incorporated to enhance clinical plausibility. Our method is the first to enable explicit anatomical structure–guided controllable pathology mask generation, balancing visual realism and semantic utility. Radiologist evaluation indicates that 78% of synthesized X-rays exhibit clinical realism, and over 40% of pseudo-masks support reliable segmentation. Consequently, detection and segmentation models trained on our synthetic data demonstrate significantly improved generalization under data-scarce conditions.
📝 Abstract
Medical image synthesis has become an essential strategy for augmenting datasets and improving model generalization in data-scarce clinical settings. However, fine-grained and controllable synthesis remains difficult due to limited high-quality annotations and domain shifts across datasets. Existing methods, often designed for natural images or well-defined tumors, struggle to generalize to chest radiographs, where disease patterns are morphologically diverse and tightly intertwined with anatomical structures. To address these challenges, we propose AURAD, a controllable radiology synthesis framework that jointly generates high-fidelity chest X-rays and pseudo semantic masks. Unlike prior approaches that rely on randomly sampled masks-limiting diversity, controllability, and clinical relevance-our method learns to generate masks that capture multi-pathology coexistence and anatomical-pathological consistency. It follows a progressive pipeline: pseudo masks are first generated from clinical prompts conditioned on anatomical structures, and then used to guide image synthesis. We also leverage pretrained expert medical models to filter outputs and ensure clinical plausibility. Beyond visual realism, the synthesized masks also serve as labels for downstream tasks such as detection and segmentation, bridging the gap between generative modeling and real-world clinical applications. Extensive experiments and blinded radiologist evaluations demonstrate the effectiveness and generalizability of our method across tasks and datasets. In particular, 78% of our synthesized images are classified as authentic by board-certified radiologists, and over 40% of predicted segmentation overlays are rated as clinically useful. All code, pre-trained models, and the synthesized dataset will be released upon publication.