Is Limited Participant Diversity Impeding EEG-based Machine Learning?

📅 2025-03-11

📈 Citations: 0

✨ Influential: 0

career value

240K/year

🤖 AI Summary

Does limited participant diversity constrain the generalizability and robustness of EEG machine learning models? This study systematically identifies subject homogeneity as a critical bottleneck causing model performance saturation: we provide the first quantitative evidence that subject distribution shift severely limits EEG model scaling laws. Methodologically, we propose a multi-level data sampling framework integrating large-scale empirical analysis, controlled data augmentation, and self-supervised pretraining to disentangle the effects of sample size and participant diversity. Our core contribution is a novel data acquisition and evaluation paradigm that prioritizes participant diversity—yielding actionable guidelines for EEG experimental design and informing principled ML method selection under low-sample regimes. Results demonstrate that increasing participant diversity yields greater performance gains than merely scaling sample volume, effectively breaking through existing performance ceilings.

Technology Category

Application Category

📝 Abstract

The application of machine learning (ML) to electroencephalography (EEG) has great potential to advance both neuroscientific research and clinical applications. However, the generalisability and robustness of EEG-based ML models often hinge on the amount and diversity of training data. It is common practice to split EEG recordings into small segments, thereby increasing the number of samples substantially compared to the number of individual recordings or participants. We conceptualise this as a multi-level data generation process and investigate the scaling behaviour of model performance with respect to the overall sample size and the participant diversity through large-scale empirical studies. We then use the same framework to investigate the effectiveness of different ML strategies designed to address limited data problems: data augmentations and self-supervised learning. Our findings show that model performance scaling can be severely constrained by participant distribution shifts and provide actionable guidance for data collection and ML research.

Problem

Research questions and friction points this paper is trying to address.

Limited participant diversity affects EEG-based ML model generalisability.

Scaling model performance depends on sample size and participant diversity.

Data augmentation and self-supervised learning address limited EEG data issues.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-level data generation process for EEG

Data augmentation to address limited data

Self-supervised learning for EEG-based ML

🔎 Similar Papers

No similar papers found.