🤖 AI Summary
Current Turing tests evaluate only behavioral human-likeness, neglecting whether artificial neural representations approximate those of the human brain. This paper introduces the “NeuroAI Turing Test”—the first rigorous evaluation framework jointly imposing empirically testable behavioral and neural criteria: models must not only match human behavior but also generate internal neural representations indistinguishable—within inter-individual natural variability—from measured human brain activity (e.g., fMRI/EEG). Methodologically, it integrates cross-modal neural decoding, representational similarity analysis (RSA), individualized brain signal modeling, and computational neurovalidation. The framework elevates NeuroAI evaluation from qualitative, heuristic reasoning to a quantitative equivalence benchmark grounded in empirical neuroscience. By anchoring assessment to the human brain as the gold-standard reference, it establishes a reproducible, comparable, and biologically constrained evaluation paradigm.
📝 Abstract
What makes an artificial system a good model of intelligence? The classical test proposed by Alan Turing focuses on behavior, requiring that an artificial agent's behavior be indistinguishable from that of a human. While behavioral similarity provides a strong starting point, two systems with very different internal representations can produce the same outputs. Thus, in modeling biological intelligence, the field of NeuroAI often aims to go beyond behavioral similarity and achieve representational convergence between a model's activations and the measured activity of a biological system. This position paper argues that the standard definition of the Turing Test is incomplete for NeuroAI, and proposes a stronger framework called the ``NeuroAI Turing Test'', a benchmark that extends beyond behavior alone and emph{additionally} requires models to produce internal neural representations that are empirically indistinguishable from those of a brain up to measured individual variability, i.e. the differences between a computational model and the brain is no more than the difference between one brain and another brain. While the brain is not necessarily the ceiling of intelligence, it remains the only universally agreed-upon example, making it a natural reference point for evaluating computational models. By proposing this framework, we aim to shift the discourse from loosely defined notions of brain inspiration to a systematic and testable standard centered on both behavior and internal representations, providing a clear benchmark for neuroscientific modeling and AI development.