๐ค AI Summary
To address poor generalizability and limited robustness of LDCT-based AI models in early lung cancer diagnosis, this work introduces the first cross-dataset AI benchmark framework for pulmonary nodule detection and lung cancer diagnosis, systematically evaluating 3D models on the open DLCS dataset (2,000+ cases). We propose Strategic Warm-Start++ (SWS++), a task-adaptive pretraining method that leverages detection-generated candidate lesion patches to jointly optimize nodule detection and malignancy classification. We further establish a unified Clinical Performance Metric (CPM) evaluation standard and publicly release all code, pretrained models, and expert annotations. SWS++ achieves AUCs of 0.71โ0.90 across multi-center clinical datasetsโmatching or surpassing state-of-the-art self-supervised pretraining methods including Models Genesis and Med3D. Moreover, our detection models demonstrate superior generalization on external benchmarks such as LUNA16 and NLST-3D+.
๐ Abstract
Lung cancer remains the leading cause of cancer-related mortality worldwide, and early detection through low-dose computed tomography (LDCT) has shown significant promise in reducing death rates. With the growing integration of artificial intelligence (AI) into medical imaging, the development and evaluation of robust AI models require access to large, well-annotated datasets. In this study, we introduce the utility of Duke Lung Cancer Screening (DLCS) Dataset, the largest open-access LDCT dataset with over 2,000 scans and 3,000 expert-verified nodules. We benchmark deep learning models for both 3D nodule detection and lung cancer classification across internal and external datasets including LUNA16, LUNA25, and NLST-3D+. For detection, we develop two MONAI-based RetinaNet models (DLCSDmD and LUNA16-mD), evaluated using the Competition Performance Metric (CPM). For classification, we compare five models, including state-of-the-art pretrained models (Models Genesis, Med3D), a selfsupervised foundation model (FMCB), a randomly initialized ResNet50, and proposed a novel Strategic Warm-Start++ (SWS++) model. SWS++ uses curated candidate patches to pretrain a classification backbone within the same detection pipeline, enabling task-relevant feature learning. Our models demonstrated strong generalizability, with SWS++ achieving comparable or superior performance to existing foundational models across multiple datasets (AUC: 0.71 to 0.90). All code, models, and data are publicly released to promote reproducibility and collaboration. This work establishes a standardized benchmarking resource for lung cancer AI research, supporting future efforts in model development, validation, and clinical translation.