DIVER-1 : Deep Integration of Vast Electrophysiological Recordings at Scale

📅 2025-12-22

📈 Citations: 0

✨ Influential: 0

career value

288K/year

🤖 AI Summary

Current foundational models for EEG and intracranial EEG (iEEG) are constrained by limited scale, hindering performance gains. To address this, we construct the largest and most diverse electrophysiological dataset to date—59.3k hours of recordings from 17.7k subjects—and establish the first data-constrained scaling law for electrophysiology. Methodologically, we propose three key innovations: (1) any-variate attention to flexibly handle heterogeneous channel counts; (2) sliding temporal conditional positional encoding to capture dynamic temporal dependencies; and (3) multi-domain reconstruction for joint learning across signal, spectral, and time-frequency domains. Leveraging these, we train DIVER-1—a Transformer-based model—via large-scale distributed self-supervised pretraining. Evaluated on standard iEEG and EEG benchmarks, DIVER-1 achieves state-of-the-art performance. Furthermore, our ablation and scaling studies systematically uncover efficient scaling trajectories and principled resource allocation guidelines, providing both theoretical foundations and engineering blueprints for next-generation neural electrophysiology foundation models.

Technology Category

Application Category

📝 Abstract

Electrophysiology signals such as EEG and iEEG are central to neuroscience, brain-computer interfaces, and clinical applications, yet existing foundation models remain limited in scale despite clear evidence that scaling improves performance. We introduce DIVER-1, a family of EEG and iEEG foundation models trained on the largest and most diverse corpus to date-5.3k hours of iEEG and 54k hours of EEG (1.6M channel-hours from over 17.7k subjects)-and scaled up to 1.82B parameters. We present the first systematic scaling law analysis for this domain, showing that they follow data-constrained scaling laws: for a given amount of data and compute, smaller models trained for extended epochs consistently outperform larger models trained briefly. This behavior contrasts with prior electrophysiology foundation models that emphasized model size over training duration. To achieve strong performance, we also design architectural innovations including any-variate attention, sliding temporal conditional positional encoding, and multi-domain reconstruction. DIVER-1 iEEG and EEG models each achieve state-of-the-art performance on their respective benchmarks, establishing a concrete guidelines for efficient scaling and resource allocation in electrophysiology foundation model development.

Problem

Research questions and friction points this paper is trying to address.

Scaling electrophysiology foundation models with large datasets

Establishing data-constrained scaling laws for EEG/iEEG models

Designing architectural innovations for improved model performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Trained on largest EEG and iEEG corpus to date

Introduced data-constrained scaling laws for electrophysiology

Designed architectural innovations like any-variate attention

🔎 Similar Papers

BrainWave: A Brain Signal Foundation Model for Clinical Applications