🤖 AI Summary
This study addresses clinical needs in lung cancer radiogenomics—specifically, somatic mutation detection (KRAS/EGFR) and T-stage prediction—from 3D chest CT. We systematically compare two paradigms: supervised learning (FMCIB pretraining + XGBoost) and self-supervised learning (DINOv2 feature extraction + ABMIL). To our knowledge, this is the first multi-task evaluation of both approaches for concurrent mutation classification and anatomical staging. Results show the supervised model achieves higher mutation detection accuracy (KRAS: 0.846; EGFR: 0.883), demonstrating superior discriminability for molecular biomarkers. In contrast, the self-supervised approach attains stronger T-stage prediction accuracy (0.797) and significantly better cross-center generalization. Our key contribution is establishing that paradigm selection must be task-driven: supervised learning is preferable for targeted mutation detection, whereas self-supervised representation learning—combined with attention-based multiple-instance learning—is more suitable for anatomy-oriented staging requiring robust generalizability across heterogeneous clinical sites.
📝 Abstract
Lung cancer is the leading cause of cancer mortality worldwide, and non-invasive methods for detecting key mutations and staging are essential for improving patient outcomes. Here, we compare the performance of two machine learning models - FMCIB+XGBoost, a supervised model with domain-specific pretraining, and Dinov2+ABMIL, a self-supervised model with attention-based multiple-instance learning - on 3D lung nodule data from the Stanford Radiogenomics and Lung-CT-PT-Dx cohorts. In the task of KRAS and EGFR mutation detection, FMCIB+XGBoost consistently outperformed Dinov2+ABMIL, achieving accuracies of 0.846 and 0.883 for KRAS and EGFR mutations, respectively. In cancer staging, Dinov2+ABMIL demonstrated competitive generalization, achieving an accuracy of 0.797 for T-stage prediction in the Lung-CT-PT-Dx cohort, suggesting SSL's adaptability across diverse datasets. Our results emphasize the clinical utility of supervised models in mutation detection and highlight the potential of SSL to improve staging generalization, while identifying areas for enhancement in mutation sensitivity.