π€ AI Summary
Cortical lesion (CL) segmentation in multiple sclerosis (MS) remains clinically challenging due to lesion smallness, labeling difficulty, and lack of robust automated methods. To address this, we establish the first multi-center deep learning benchmark specifically for CLs, integrating 656 3T/7T MRI scans (MP2RAGE/MPRAGE) from four institutions, and propose a nnU-Net variant tailored to CL characteristics. We conduct the first systematic out-of-distribution generalization evaluation across scanner types and sites, complemented by interpretability analysis to quantify the impact of data heterogeneity and annotation ambiguity on model performance. Our method achieves F1-scores of 0.64 (in-domain) and 0.50 (out-of-distribution), demonstrating improved generalizability. All code and models are fully open-sourced, enhancing reproducibility, cross-site adaptability, and clinical translatability of automated CL segmentation.
π Abstract
Cortical lesions (CLs) have emerged as valuable biomarkers in multiple sclerosis (MS), offering high diagnostic specificity and prognostic relevance. However, their routine clinical integration remains limited due to subtle magnetic resonance imaging (MRI) appearance, challenges in expert annotation, and a lack of standardized automated methods. We propose a comprehensive multi-centric benchmark of CL detection and segmentation in MRI. A total of 656 MRI scans, including clinical trial and research data from four institutions, were acquired at 3T and 7T using MP2RAGE and MPRAGE sequences with expert-consensus annotations. We rely on the self-configuring nnU-Net framework, designed for medical imaging segmentation, and propose adaptations tailored to the improved CL detection. We evaluated model generalization through out-of-distribution testing, demonstrating strong lesion detection capabilities with an F1-score of 0.64 and 0.5 in and out of the domain, respectively. We also analyze internal model features and model errors for a better understanding of AI decision-making. Our study examines how data variability, lesion ambiguity, and protocol differences impact model performance, offering future recommendations to address these barriers to clinical adoption. To reinforce the reproducibility, the implementation and models will be publicly accessible and ready to use at https://github.com/Medical-Image-Analysis-Laboratory/ and https://doi.org/10.5281/zenodo.15911797.