🤖 AI Summary
Colorectal cancer (CRC) exhibits nonspecific early symptoms and low patient reporting rates, resulting in substantial diagnostic delays: only 14.4% of cases in the UK are diagnosed at Stage I—where 5-year survival reaches 80–95%—versus a stark decline to ~10% at Stage IV. To address this, we propose the first explainable multimodal AI system for CRC early detection. Our method innovatively integrates Savitzky-Golay–smoothed dynamic blood signal fingerprints with structured clinical metadata, employing a hybrid gradient-boosting and neural network architecture. Full interpretability is ensured via deep integration of SHAP and LIME for end-to-end decision traceability and clinical understandability. The model achieves an AUC of 0.92 and significantly improves Stage I detection sensitivity. Clinical validation by gastroenterology specialists yields a 94% acceptance rate for interpretability, confirming its readiness for scalable population-level screening.
📝 Abstract
Colorectal cancer (CRC) ranks as the second leading cause of cancer-related deaths and the third most prevalent malignant tumour worldwide. Early detection of CRC remains problematic due to its non-specific and often embarrassing symptoms, which patients frequently overlook or hesitate to report to clinicians. Crucially, the stage at which CRC is diagnosed significantly impacts survivability, with a survival rate of 80-95% for Stage I and a stark decline to 10% for Stage IV. Unfortunately, in the UK, only 14.4% of cases are diagnosed at the earliest stage (Stage I). In this study, we propose ColonScopeX, a machine learning framework utilizing explainable AI (XAI) methodologies to enhance the early detection of CRC and pre-cancerous lesions. Our approach employs a multimodal model that integrates signals from blood sample measurements, processed using the Savitzky-Golay algorithm for fingerprint smoothing, alongside comprehensive patient metadata, including medication history, comorbidities, age, weight, and BMI. By leveraging XAI techniques, we aim to render the model's decision-making process transparent and interpretable, thereby fostering greater trust and understanding in its predictions. The proposed framework could be utilised as a triage tool or a screening tool of the general population. This research highlights the potential of combining diverse patient data sources and explainable machine learning to tackle critical challenges in medical diagnostics.