🤖 AI Summary
Conventional cancer screening is costly and relies on specialized modalities (e.g., imaging or genomics), limiting scalability. Method: We propose a large-scale risk prediction framework leveraging sparse, non-temporal, high-order medical event sequences from routine electronic health records (EHRs). Our approach uniquely integrates survival analysis variables into gradient-boosting models (XGBoost/LightGBM), eliminating the need for deep clinical data or high-performance computing. It employs survival-aware feature engineering and medical event sequence encoding. Results: Evaluated on a retrospective cohort of >1.1 million individuals, our method achieves an Average Precision of 22.8% ± 2.7%, representing a 51% improvement over baselines; TOP@1000 recall increases 4.7–6.4×; and clinical validation yields 84 true positive detections per 1,000 screened individuals (NNNS = 9), significantly outperforming conventional strategies.
📝 Abstract
Specific medical cancer screening methods are often costly, time-consuming, and weakly applicable on a large scale. Advanced Artificial Intelligence (AI) methods greatly help cancer detection but require specific or deep medical data. These aspects prevent the mass implementation of cancer screening methods. For this reason, it is a disruptive change for healthcare to apply AI methods for mass personalized assessment of the cancer risk among patients based on the existing Electronic Health Records (EHR) volume. This paper presents a novel Can-SAVE cancer risk assessment method combining a survival analysis approach with a gradient-boosting algorithm. It is highly accessible and resource-efficient, utilizing only a sequence of high-level medical events. We tested the proposed method in a long-term retrospective experiment covering more than 1.1 million people and four regions of Russia. The Can-SAVE method significantly exceeds the baselines by the Average Precision metric of 22.8%$pm$2.7% vs 15.1%$pm$2.6%. The extensive ablation study also confirmed the proposed method's dominant performance. The experiment supervised by oncologists shows a reliable cancer patient detection rate of up to 84 out of 1000 selected. Such results surpass the medical screening strategies estimates; the typical age-specific Number Needed to Screen is only 9 out of 1000 (for colorectal cancer). Overall, our experiments show a 4.7-6.4 times improvement in cancer detection rate (TOP@1k) compared to the traditional healthcare risk estimation approach.