🤖 AI Summary
The molecular mechanisms underlying multiple sclerosis (MS) remain incompletely understood. This study develops an end-to-end machine learning pipeline that integrates bulk microarray and single-cell RNA-seq data from peripheral blood and cerebrospinal fluid (CSF) to distinguish MS patients from healthy controls using an XGBoost classifier. By combining SHAP-based interpretability with differential expression analysis, the work uncovers novel mechanisms at the multi-tissue transcriptomic level, including non-canonical immune checkpoints and virus-related pathways. The model achieves high performance in CSF B cells (AUC = 0.94) and microarray data (AUC = 0.86), identifying several candidate biomarkers linked to immune activation, the ubiquitin–proteasome system, and Epstein–Barr virus infection.
📝 Abstract
Multiple Sclerosis (MS) is a chronic autoimmune disease of the central nervous system whose molecular mechanisms remain incompletely understood. In this study, we developed an end-to-end machine learning pipeline to analyze transcriptomic data from peripheral blood mononuclear cells and cerebrospinal fluid, integrating both bulk microarray and single-cell RNA sequencing datasets (concentrating on CD4+ and B-cells). After rigorous preprocessing, batch correction, and gene declustering, XGBoost classifiers were trained to distinguish MS patients from healthy controls. Explainable AI tools, namely SHapley Additive exPlanations (SHAP), were employed to identify key genes driving classification, and results were compared with Differential Expression Analysis (DEA). SHAP-prioritized genes were further investigated through interaction networks and pathway enrichment analyses. The models achieved strong performance, particularly in CSF B-cells (AUC=0.94) and microarray (AUC=0.86). SHAP gene selection proved to be complementary to classical DEA. Gene clusters identified across multiple datasets highlighted immune activation, non-canonical immune checkpoints (ITK, CLEC2D, KLRG1, CEACAM1), ribosomal and translational programs, ubiquitin-proteasome regulation, lipid trafficking, and Epstein-Barr virus-related pathways. Our integrative and explainable framework reveals complementary insights beyond conventional analysis and provides novel mechanistic hypotheses and potential biomarkers for MS pathogenesis.