🤖 AI Summary
Existing EEG foundation models suffer from insufficient reconstruction fidelity due to neural tokenizers’ inability to preserve high-frequency dynamic information. To address this, we propose NeuroRVQ—a multiscale tokenization framework for full-spectrum EEG—integrating multiscale feature extraction, hierarchical residual vector quantization (RVQ), and a phase-amplitude-aware loss function. NeuroRVQ is the first method to achieve cross-band, high-fidelity EEG signal reconstruction. It substantially reduces reconstruction error and consistently outperforms state-of-the-art large-scale EEG models across diverse downstream tasks, demonstrating superior representational capacity and generalizability. By providing a high-quality, strongly prior-guided discrete representation, NeuroRVQ establishes a robust foundation for generative EEG foundation models, enabling advanced applications such as neural decoding and multimodal biosignal fusion.
📝 Abstract
Electroencephalography (EEG) captures neural activity across multiple temporal and spectral scales, yielding signals that are rich but complex for representation learning. Recently, EEG foundation models trained to predict masked signal-tokens have shown promise for learning generalizable representations. However, their performance is hindered by their signal tokenization modules. Existing neural tokenizers fail to preserve high-frequency dynamics, limiting their ability to reconstruct EEG signals with high fidelity. We introduce NeuroRVQ, a scalable Large Brainwave Model (LBM) centered on a codebook-based tokenizer. Our tokenizer integrates: (i) multi-scale feature extraction modules that capture the full frequency neural spectrum; (ii) hierarchical residual vector quantization (RVQ) codebooks for high-resolution encoding; and, (iii) an EEG signal phase- and amplitude-aware loss function for efficient training. This design enables efficient EEG compression while supporting accurate reconstruction across all frequency bands, leading to robust generative masked modeling. Our empirical results demonstrate that NeuroRVQ achieves lower reconstruction error and outperforms existing LBMs on a variety of downstream tasks. More broadly, NeuroRVQ tokenizer establishes a strong prior for codebook-based general-purpose brainwave models, enabling advances in neural decoding, generative modeling and multimodal biosignal integration.