๐ค AI Summary
Existing EEG biomarkers exhibit poor generalizability across diverse populations and multicenter settings, often compromised by cohort-specific artifacts that undermine clinical reliability. To address this, this work proposes the first systematic evaluation framework for cross-population EEG biomarkers, leveraging all 75 possible cross-trainingโtesting configurations across five independent cohorts. Integrating nested cross-validation, channel selection, n-gram augmentation, mixed-risk optimization, and hypothesis space contraction theory, the framework establishes a population-aware assessment paradigm robust to distribution shifts. The study uncovers asymmetric transferability across populations and demonstrates that training on multiple cohorts fosters population-invariant representations. Achieving up to 94.1% accuracy on held-out cohorts, the results further confirm that greater diversity in training populations significantly enhances both model performance and biomarker stability.
๐ Abstract
Developing robust and clinically reliable EEG biomarkers requires evaluation frameworks that explicitly address cross population generalization in multi site settings such as Parkinsons disease (PD) detection. Models trained under i.i.d. assumptions often capture population specific artifacts rather than disease relevant neural structure, leading to poor generalization across clinical cohorts. EEG further amplifies this challenge due to low signal to noise ratio and heterogeneous acquisition conditions. We propose a population aware evaluation framework to assess the robustness and clinical reliability of EEG biomarkers under distribution shift. Using an n gram expansion strategy, we enumerate all cross population train test configurations across five independent cohorts, resulting in 75 directional evaluations. A nested cross validation design with integrated channel selection ensures prospective biomarker identification without population leakage. Results show that cross population transfer is asymmetric and that both accuracy and biomarker stability improve with increasing training population diversity, achieving up to 94.1% accuracy on held out cohorts. A theoretical analysis based on mixture risk optimization and hypothesis space contraction explains these trends, showing that multi population training promotes population robust representations. This work establishes a principled framework for learning robust, generalizable, and clinically reliable EEG biomarkers for multi site biomedical applications.