How Much is Brain Data Worth for Machine Learning?

📅 2026-05-09
📈 Citations: 0
Influential: 0
📄 PDF

career value

220K/year
🤖 AI Summary
This study addresses the value of neuroimaging data in enhancing machine learning model performance and delineates the conditions under which collecting such data is justified. By constructing a linear Gaussian multimodal model, the authors theoretically analyze an estimator that fuses neural recordings with task labels, deriving scaling laws that characterize performance as a function of sample size. They establish, for the first time, an equivalence ratio quantifying the exchangeability between neural and task data. The work further elucidates how brain data improves model robustness under distribution shift, demonstrating that its utility hinges on task–brain alignment, noise levels, latent dimensionality, and sample size. These insights collectively define the regimes in which incorporating neural data is advantageous under a fixed data acquisition budget.
📝 Abstract
If a person can solve a task, can measuring their brain make it easier to train a model to solve that task too? Recent NeuroAI work suggests that supplementing task training with neural recordings can modestly improve model performance and robustness. However, it is unclear when there should be a benefit from using neural data and how much benefit to expect. We formulate this question mathematically, and begin to address it theoretically using a simple, analytically tractable linear gaussian model of task targets and neural recordings. For a multimodal estimator trained on both brain data and task labels, we derive scaling laws for how performance scales with the numbers of brain and task samples. From these laws we derive relative value and exchange rates between brain samples and task samples, quantifying how much extra task samples neural data is worth as a function of task-brain alignment, neural and task noise, latent dimension, and brain data sample size. We also analyze test distribution shift, to identify conditions where brain-regularized learning can produce substantial robustness gains through learned invariances. Finally, under a fixed collection budget, we characterize the regimes in which brain data is worth collecting. Our results provide a foundation for understanding how valuable brain data could be for improving machine learning.
Problem

Research questions and friction points this paper is trying to address.

NeuroAI
brain data value
machine learning
scaling laws
distribution shift
Innovation

Methods, ideas, or system contributions that make the work stand out.

NeuroAI
scaling laws
brain-data value
multimodal learning
distributional robustness
L
Lane Lewis
Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA; Department of Machine Learning, Carnegie Mellon University, Pittsburgh, PA, USA; NSF AI Institute for Artificial and Natural Intelligence (ARNI)
Zhixin Wang
Zhixin Wang
ZheJiang University
RL systems
D
David Schwab
Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA; NSF AI Institute for Artificial and Natural Intelligence (ARNI); CUNY Graduate Center, New York, NY, USA
Xaq Pitkow
Xaq Pitkow
Associate Professor of Computational Neuroscience, Carnegie Mellon University
Computational NeuroscienceTheoretical NeuroscienceMachine LearningControl Theory