π€ AI Summary
This study addresses the limited performance of reconstruction-based EEG foundation models in low-resource settings, where they often fail to surpass smaller supervised models due to representation bias. The work reveals, for the first time from a spectral decomposition perspective, that such models disproportionately emphasize aperiodic low-frequency components during self-supervised pretraining while severely neglecting high-frequency oscillatory information. Furthermore, their embedding spaces are susceptible to subject identity interference, which undermines the learning of task-relevant features. Through systematic experiments involving synthetic and real EEG data, linear probing evaluations, and spectral analyses, the study validates this mechanism and exposes a fundamental flaw in current reconstruction objectives, thereby providing critical theoretical insights for advancing EEG foundation models.
π Abstract
EEG foundation models, pre-trained on large-scale unlabelled EEG data, have emerged as a promising direction towards learning generalizable EEG representations. Despite showing positive results in data-rich regimes, they often fail to outperform significantly smaller supervised models in low-resource settings compared to fully supervised models. We provide a mechanistic account of this shortcoming, attributing it to a fundamental mismatch between reconstruction-based pretext tasks and the idiosyncratic spectral structure of EEG signals, which decompose into distinct high-power aperiodic and low-power oscillatory components. Using controlled, synthetically-generated EEG inputs, we demonstrate that EEG foundation model embeddings are biased to capture the aperiodic components of the EEG signal while under-representing oscillatory components, particularly at higher frequencies. Additionally, linear probe evaluations on real-world BCI datasets further reveal that embeddings encode subject identity more strongly than task-relevant information, thereby reinforcing the low-frequency and aperiodic component bias in foundation model embeddings trained primarily on reconstruction based objectives. Together, these findings elucidate a failure mode in reconstruction based EEG foundation models and motivate future work to incorporate auxiliary losses explicitly targeting high-frequency oscillatory structure as a path toward more capable and generalizable EEG representations.