🤖 AI Summary
This work addresses the lack of standardized evaluation protocols in neural imaging AI, where existing studies exhibit substantial heterogeneity in preprocessing, training procedures, and task coverage. To bridge this gap, the authors introduce NeuralBench—specifically, NeuralBench-EEG v1.0, the first standardized benchmarking framework for electroencephalography (EEG). It integrates 36 tasks, 14 deep learning architectures, and 94 datasets, enabling unified cross-task, cross-dataset, and cross-model evaluation. The framework is also extensible to other neuroimaging modalities such as MEG and fMRI. Large-scale benchmarking reveals that current foundation models offer only marginal improvements over task-specific models, and many cognitive decoding and clinical prediction tasks remain challenging. These findings underscore the critical role of NeuralBench in advancing rigorous and reproducible research in neural AI.
📝 Abstract
Deep learning and large public datasets have recently catalyzed the proliferation of AI models for processing brain recordings. However, systematically evaluating these models remains a challenge: not only do the preprocessing pipelines, training and finetuning approaches largely vary across studies, but their downstream evaluation is often limited to small sets of tasks and/or datasets. Here, we present NeuralBench: a unified framework for benchmarking AI models of brain activity. We accompany this framework with NeuralBench-EEG v1.0 -- a large EEG benchmark that includes 36 electroencephalography (EEG) tasks and 14 deep learning architectures, and is evaluated on 94 datasets accessed through a standardized interface. This first EEG-focused release already highlights two main findings. First, current foundation models only marginally outperform task-specific models. Second, a large set of tasks (e.g. cognitive decoding, clinical predictions) remain highly challenging, even for the best models. Critically, NeuralBench is designed for the integration of new tasks, datasets, models, and neuroimaging modalities, as illustrated by preliminary extensions to MEG and fMRI datasets and models. Through this white paper, we invite the community to expand this open-source framework and work together toward a unified benchmarking standard for neuroimaging models.