Comparative Analysis of Machine Learning Models for Lung Cancer Mutation Detection and Staging Using 3D CT Scans

📅 2025-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses clinical needs in lung cancer radiogenomics—specifically, somatic mutation detection (KRAS/EGFR) and T-stage prediction—from 3D chest CT. We systematically compare two paradigms: supervised learning (FMCIB pretraining + XGBoost) and self-supervised learning (DINOv2 feature extraction + ABMIL). To our knowledge, this is the first multi-task evaluation of both approaches for concurrent mutation classification and anatomical staging. Results show the supervised model achieves higher mutation detection accuracy (KRAS: 0.846; EGFR: 0.883), demonstrating superior discriminability for molecular biomarkers. In contrast, the self-supervised approach attains stronger T-stage prediction accuracy (0.797) and significantly better cross-center generalization. Our key contribution is establishing that paradigm selection must be task-driven: supervised learning is preferable for targeted mutation detection, whereas self-supervised representation learning—combined with attention-based multiple-instance learning—is more suitable for anatomy-oriented staging requiring robust generalizability across heterogeneous clinical sites.

Technology Category

Application Category

📝 Abstract
Lung cancer is the leading cause of cancer mortality worldwide, and non-invasive methods for detecting key mutations and staging are essential for improving patient outcomes. Here, we compare the performance of two machine learning models - FMCIB+XGBoost, a supervised model with domain-specific pretraining, and Dinov2+ABMIL, a self-supervised model with attention-based multiple-instance learning - on 3D lung nodule data from the Stanford Radiogenomics and Lung-CT-PT-Dx cohorts. In the task of KRAS and EGFR mutation detection, FMCIB+XGBoost consistently outperformed Dinov2+ABMIL, achieving accuracies of 0.846 and 0.883 for KRAS and EGFR mutations, respectively. In cancer staging, Dinov2+ABMIL demonstrated competitive generalization, achieving an accuracy of 0.797 for T-stage prediction in the Lung-CT-PT-Dx cohort, suggesting SSL's adaptability across diverse datasets. Our results emphasize the clinical utility of supervised models in mutation detection and highlight the potential of SSL to improve staging generalization, while identifying areas for enhancement in mutation sensitivity.
Problem

Research questions and friction points this paper is trying to address.

Comparing ML models for lung cancer mutation detection
Evaluating performance on 3D CT scans for staging
Assessing clinical utility of supervised vs self-supervised learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Supervised FMCIB+XGBoost for mutation detection
Self-supervised Dinov2+ABMIL for cancer staging
3D CT scans for non-invasive lung cancer analysis
🔎 Similar Papers
No similar papers found.
Y
Yiheng Li
Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University, Stanford, CA 94305, USA
F
Francisco Carrillo-Perez
Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University, Stanford, CA 94305, USA
M
Mohammed Alawad
National Center for AI (NCAI), Saudi Data and AI Authority (SDAIA), Riyadh, Saudi Arabia
Olivier Gevaert
Olivier Gevaert
Stanford University
machine learningbioinformaticsepigenomicsradiogenomicsdigital twins