Machine Learning for analysis of Multiple Sclerosis cross-tissue bulk and single-cell transcriptomics data

📅 2026-03-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The molecular mechanisms underlying multiple sclerosis (MS) remain incompletely understood. This study develops an end-to-end machine learning pipeline that integrates bulk microarray and single-cell RNA-seq data from peripheral blood and cerebrospinal fluid (CSF) to distinguish MS patients from healthy controls using an XGBoost classifier. By combining SHAP-based interpretability with differential expression analysis, the work uncovers novel mechanisms at the multi-tissue transcriptomic level, including non-canonical immune checkpoints and virus-related pathways. The model achieves high performance in CSF B cells (AUC = 0.94) and microarray data (AUC = 0.86), identifying several candidate biomarkers linked to immune activation, the ubiquitin–proteasome system, and Epstein–Barr virus infection.

Technology Category

Application Category

📝 Abstract
Multiple Sclerosis (MS) is a chronic autoimmune disease of the central nervous system whose molecular mechanisms remain incompletely understood. In this study, we developed an end-to-end machine learning pipeline to analyze transcriptomic data from peripheral blood mononuclear cells and cerebrospinal fluid, integrating both bulk microarray and single-cell RNA sequencing datasets (concentrating on CD4+ and B-cells). After rigorous preprocessing, batch correction, and gene declustering, XGBoost classifiers were trained to distinguish MS patients from healthy controls. Explainable AI tools, namely SHapley Additive exPlanations (SHAP), were employed to identify key genes driving classification, and results were compared with Differential Expression Analysis (DEA). SHAP-prioritized genes were further investigated through interaction networks and pathway enrichment analyses. The models achieved strong performance, particularly in CSF B-cells (AUC=0.94) and microarray (AUC=0.86). SHAP gene selection proved to be complementary to classical DEA. Gene clusters identified across multiple datasets highlighted immune activation, non-canonical immune checkpoints (ITK, CLEC2D, KLRG1, CEACAM1), ribosomal and translational programs, ubiquitin-proteasome regulation, lipid trafficking, and Epstein-Barr virus-related pathways. Our integrative and explainable framework reveals complementary insights beyond conventional analysis and provides novel mechanistic hypotheses and potential biomarkers for MS pathogenesis.
Problem

Research questions and friction points this paper is trying to address.

Multiple Sclerosis
transcriptomics
biomarker discovery
immune mechanisms
cross-tissue analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

explainable AI
XGBoost
SHAP
multi-omics integration
single-cell transcriptomics
🔎 Similar Papers
No similar papers found.
F
Francesco Massafra
Department of Computer Science, University of Pisa, Pisa, Italy.
S
Samuele Punzo
Department of Computer Science, University of Pisa, Pisa, Italy.
S
Silvia Giulia Galfré
Department of Computer Science, University of Pisa, Pisa, Italy.
A
Alessandro Maglione
Department of Computer Science, University of Turin, Turin, Italy
S
Simone Pernice
Department of Computer Science, University of Turin, Turin, Italy; Laboratorio InfoLife, Consorzio Interuniversitario Nazionale per l’Informatica, Rome, Italy
Stefano Forti
Stefano Forti
Department of Computer Science, University of Pisa
cloud-edge continuumdistributed systemsgreen computingautomated reasoning
S
Simona Rolla
Department of Clinical and Biological Sciences, University of Turin, Turin, Italy
M
Marco Beccuti
Department of Computer Science, University of Turin, Turin, Italy; Laboratorio InfoLife, Consorzio Interuniversitario Nazionale per l’Informatica, Rome, Italy
M
Marinella Clerico
Department of Clinical and Biological Sciences, University of Turin, Turin, Italy
C
Corrado Priami
Department of Computer Science, University of Pisa, Pisa, Italy.
Alina Sîrbu
Alina Sîrbu
Computer Science Department, University of Pisa, Italy
computational biologycomplex systemsmachine learningsocial network analysismigration