🤖 AI Summary
Existing EEG foundation models suffer from two key limitations: (1) neglecting the heterogeneity of spatiotemporal dependencies in EEG signals, and (2) poor generalization across multi-source, heterogeneous EEG data formats. To address these challenges, we propose the first general-purpose foundation model for EEG decoding. Our method introduces three core innovations: (1) a cross-shaped Transformer architecture that explicitly decouples spatial and temporal dependency modeling; (2) an asymmetric conditional positional encoding scheme that adaptively accommodates diverse EEG formats—varying in sampling rate, channel count, and experimental paradigm; and (3) a patch-based masked reconstruction pretraining objective coupled with a dual-path attention mechanism. Evaluated on 10 downstream BCI tasks across 12 public datasets, our model achieves state-of-the-art performance, demonstrating substantial improvements in generalization capability and robustness to domain shifts. The implementation is publicly available.
📝 Abstract
Electroencephalography (EEG) is a non-invasive technique to measure and record brain electrical activity, widely used in various BCI and healthcare applications. Early EEG decoding methods rely on supervised learning, limited by specific tasks and datasets, hindering model performance and generalizability. With the success of large language models, there is a growing body of studies focusing on EEG foundation models. However, these studies still leave challenges: Firstly, most of existing EEG foundation models employ full EEG modeling strategy. It models the spatial and temporal dependencies between all EEG patches together, but ignores that the spatial and temporal dependencies are heterogeneous due to the unique structural characteristics of EEG signals. Secondly, existing EEG foundation models have limited generalizability on a wide range of downstream BCI tasks due to varying formats of EEG data, making it challenging to adapt to. To address these challenges, we propose a novel foundation model called CBraMod. Specifically, we devise a criss-cross transformer as the backbone to thoroughly leverage the structural characteristics of EEG signals, which can model spatial and temporal dependencies separately through two parallel attention mechanisms. And we utilize an asymmetric conditional positional encoding scheme which can encode positional information of EEG patches and be easily adapted to the EEG with diverse formats. CBraMod is pre-trained on a very large corpus of EEG through patch-based masked EEG reconstruction. We evaluate CBraMod on up to 10 downstream BCI tasks (12 public datasets). CBraMod achieves the state-of-the-art performance across the wide range of tasks, proving its strong capability and generalizability. The source code is publicly available at https://github.com/wjq-learning/CBraMod.