π€ AI Summary
AI-based lung cancer screening is hindered by the scarcity of high-quality clinical CT data, resulting in limited model generalizability and clinical robustness. To address this, we propose the first anatomy-aware digital twin framework for pulmonary nodule simulation. Our method innovatively integrates XCAT3βa high-fidelity anthropomorphic phantom platformβwith X-Lesions, a parametric nodule generator supporting multi-attribute control, and DukeSim, a vendor-agnostic CT physics simulator, establishing a closed-loop synthetic paradigm capable of modeling rare nodule phenotypes and enabling interpretable synthesis. Leveraging this framework, we generate a synthetic CT dataset comprising 3,072 cases with pixel-level annotations. When used in joint training, the synthetic data improves nodule detection performance by 10%, while segmentation and classification accuracy increase by 2β9%. Critically, the augmented models demonstrate significantly enhanced generalizability across diverse CT scanners and heterogeneous patient populations, thereby improving clinical applicability.
π Abstract
AI models for lung cancer screening are limited by data scarcity, impacting generalizability and clinical applicability. Generative models address this issue but are constrained by training data variability. We introduce SYN-LUNGS, a framework for generating high-quality 3D CT images with detailed annotations. SYN-LUNGS integrates XCAT3 phantoms for digital twin generation, X-Lesions for nodule simulation (varying size, location, and appearance), and DukeSim for CT image formation with vendor and parameter variability. The dataset includes 3,072 nodule images from 1,044 simulated CT scans, with 512 lesions and 174 digital twins. Models trained on clinical + simulated data outperform clinical only models, achieving 10% improvement in detection, 2-9% in segmentation and classification, and enhanced synthesis.By incorporating anatomy-informed simulations, SYN-LUNGS provides a scalable approach for AI model development, particularly in rare disease representation and improving model reliability.