Robust Semi-Supervised CT Radiomics for Lung Cancer Prognosis: Cost-Effective Learning with Limited Labels and SHAP Interpretation

📅 2025-07-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Label scarcity, poor generalizability, and limited interpretability hinder prognostic modeling from lung CT images. Method: We propose a robust radiomics framework integrating pseudo-labeling-based semi-supervised learning with SHAP-based interpretability analysis. Multi-scale features are extracted via LoG and wavelet filtering; 56 dimensionality reduction and 27 classification combinations are systematically evaluated to construct an end-to-end semi-supervised pipeline. Crucially, pseudo-labeling is innovatively coupled with SHAP to jointly enhance performance and clinical trustworthiness. Results: With only 10% labeled data, the model achieves 0.90 cross-validation accuracy and 0.88 external validation accuracy—outperforming the fully supervised baseline by 17% while significantly reducing variance. It demonstrates superior cross-center generalization, establishing an efficient, stable, and interpretable paradigm for small-sample medical image prognostication.

Technology Category

Application Category

📝 Abstract
Background: CT imaging is vital for lung cancer management, offering detailed visualization for AI-based prognosis. However, supervised learning SL models require large labeled datasets, limiting their real-world application in settings with scarce annotations. Methods: We analyzed CT scans from 977 patients across 12 datasets extracting 1218 radiomics features using Laplacian of Gaussian and wavelet filters via PyRadiomics Dimensionality reduction was applied with 56 feature selection and extraction algorithms and 27 classifiers were benchmarked A semi supervised learning SSL framework with pseudo labeling utilized 478 unlabeled and 499 labeled cases Model sensitivity was tested in three scenarios varying labeled data in SL increasing unlabeled data in SSL and scaling both from 10 percent to 100 percent SHAP analysis was used to interpret predictions Cross validation and external testing in two cohorts were performed. Results: SSL outperformed SL, improving overall survival prediction by up to 17 percent. The top SSL model, Random Forest plus XGBoost classifier, achieved 0.90 accuracy in cross-validation and 0.88 externally. SHAP analysis revealed enhanced feature discriminability in both SSL and SL, especially for Class 1 survival greater than 4 years. SSL showed strong performance with only 10 percent labeled data, with more stable results compared to SL and lower variance across external testing, highlighting SSL's robustness and cost effectiveness. Conclusion: We introduced a cost-effective, stable, and interpretable SSL framework for CT-based survival prediction in lung cancer, improving performance, generalizability, and clinical readiness by integrating SHAP explainability and leveraging unlabeled data.
Problem

Research questions and friction points this paper is trying to address.

Improving lung cancer prognosis with limited labeled CT scans
Enhancing prediction accuracy using semi-supervised learning
Reducing annotation costs while maintaining robust performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semi-supervised learning with pseudo-labeling
Radiomics feature extraction via PyRadiomics
SHAP analysis for prediction interpretability
🔎 Similar Papers
No similar papers found.
M
Mohammad R. Salmanpour
Department of Radiology, University of British Columbia, Vancouver, BC, Canada; Department of Integrative Oncology, BC Cancer Research Institute, Vancouver, BC, Canada; Technological Virtual Collaboration (TECVICO Corp.), Vancouver, BC, Canada
Amir Hossein Pouria
Amir Hossein Pouria
graduated from Amirkabir University of Technology
Network scienceRecommender SystemsBig DataLLMMedical Image Processing
S
Sonia Falahati
Technological Virtual Collaboration (TECVICO Corp.), Vancouver, BC, Canada; Electrical and Computer Engineering Department, Nooshirvani University of Technology, Babol, Iran
S
Shahram Taeb
Department of Radiology, School of Paramedical Sciences, Guilan University of Medical Sciences, Rasht, Iran
S
Somayeh Sadat Mehrnia
Department of Integrative Oncology, Breast Cancer Research Center, Motamed Cancer Institute, ACECR, Tehran, Iran
Ali Fathi Jouzdani
Ali Fathi Jouzdani
M.D. | School of Cognitive Sciences, IPM | Neuroscience & Neoplasia AI Research Group (NAIRG)
NeuromodulationNeurosurgeryNeuro-oncologyComputational NeurosciencePrecision Medicine
Mehrdad Oveisi
Mehrdad Oveisi
The University of British Columbia
AI/MLEducationComputational BiologyBiomedical InformaticsData Science
Ilker Hacihaliloglu
Ilker Hacihaliloglu
Department of Radiology, Department of Medicine, University of British Columbia
Biomedical EngineeringMedical Image ProcessingUltrasound Image ProcessingImage Guided Surgery and TherapyDeep Learning f
Arman Rahmim
Arman Rahmim
Professor of Radiology, Physics and Biomedical Engineering, University of British Columbia
computational imagingmolecular imagingpersonalized cancer therapyAItheranostics