🤖 AI Summary
Current EEG foundation models face challenges including scarce labeled data, inefficient spatiotemporal modeling, excessive parameter counts, and inconsistent benchmarking—hindering reproducibility and generalizability. To address these, we propose NeuroFormer, a lightweight EEG foundation model featuring a novel alternating attention mechanism that jointly captures intra-channel temporal dynamics and inter-channel spatial dependencies. We further introduce channel-wise patch-based tokenization to enhance computational efficiency and cross-task generalization. With parameter counts ranging from 3.6M to 85M, NeuroFormer achieves 2× faster inference and 6× lower memory consumption compared to prior models. It establishes new state-of-the-art (SOTA) performance on public benchmarks for emotion recognition and epilepsy detection, while also delivering strong results in abnormality classification and gait prediction. NeuroFormer thus provides an efficient, general-purpose, and reproducible paradigm for EEG representation learning.
📝 Abstract
Electroencephalograph (EEG) is a crucial tool for studying brain activity. Recently, self-supervised learning methods leveraging large unlabeled datasets have emerged as a potential solution to the scarcity of widely available annotated EEG data. However, current methods suffer from at least one of the following limitations: i) sub-optimal EEG signal modeling, ii) model sizes in the hundreds of millions of trainable parameters, and iii) reliance on private datasets and/or inconsistent public benchmarks, hindering reproducibility. To address these challenges, we introduce a Compact Encoder for Representations of Brain Oscillations using alternating attention (CEReBrO), a new small EEG foundation model. Our tokenization scheme represents EEG signals at a per-channel patch granularity. We propose an alternating attention mechanism that jointly models intra-channel temporal dynamics and inter-channel spatial correlations, achieving 2x speed improvement with 6x less memory required compared to standard self-attention. We present several model sizes ranging from 3.6 million to 85 million parameters. Pre-trained on over 20,000 hours of publicly available scalp EEG recordings with diverse channel configurations, our models set new benchmarks in emotion detection and seizure detection tasks, with competitive performance in anomaly classification and gait prediction. This validates our models' effectiveness and effictiveness.