🤖 AI Summary
Epilepsy research has long been hindered by the absence of large-scale, standardized intracranial electroencephalography (iEEG) datasets and unified evaluation benchmarks, impeding model reproducibility, cross-center validation, and clinical translation. To address this gap, this work integrates multicenter data to construct the largest preoperative iEEG resource to date, comprising high-resolution recordings from 302 patients totaling 178 hours. For the first time, heterogeneous iEEG data from diverse sources are harmonized in format and metadata, accompanied by over 36,000 expert-validated pathological event annotations and comprehensive clinical metadata. Building upon this resource, we establish a reproducible benchmark for clinically relevant tasks such as epileptogenic zone localization, enabling end-to-end modeling of long sequences and cross-domain pretraining, thereby significantly enhancing model generalizability and clinical translatability.
📝 Abstract
Epilepsy affects over 50 million people worldwide, and one-third of patients suffer drug-resistant seizures where surgery offers the best chance of seizure freedom. Accurate localization of the epileptogenic zone (EZ) relies on intracranial EEG (iEEG). Clinical workflows, however, remain constrained by labor-intensive manual review. At the same time, existing data-driven approaches are typically developed on single-center datasets that are inconsistent in format and metadata, lack standardized benchmarks, and rarely release pathological event annotations, creating barriers to reproducibility, cross-center validation, and clinical relevance. With extensive efforts to reconcile heterogeneous iEEG formats, metadata, and recordings across publicly available sources, we present $\textbf{Omni-iEEG}$, a large-scale, pre-surgical iEEG resource comprising $\textbf{302 patients}$ and $\textbf{178 hours}$ of high-resolution recordings. The dataset includes harmonized clinical metadata such as seizure onset zones, resections, and surgical outcomes, all validated by board-certified epileptologists. In addition, Omni-iEEG provides over 36K expert-validated annotations of pathological events, enabling robust biomarker studies. Omni-iEEG serves as a bridge between machine learning and epilepsy research. It defines clinically meaningful tasks with unified evaluation metrics grounded in clinical priors, enabling systematic evaluation of models in clinically relevant settings. Beyond benchmarking, we demonstrate the potential of end-to-end modeling on long iEEG segments and highlight the transferability of representations pretrained on non-neurophysiological domains. Together, these contributions establish Omni-iEEG as a foundation for reproducible, generalizable, and clinically translatable epilepsy research. The project page with dataset and code links is available at omni-ieeg.github.io/omni-ieeg.