🤖 AI Summary
This study identifies a pervasive “repetition-induced stimulus confounding” problem in EEG-based neural decoding: when identical stimuli are used in both training and test sets, stimulus identity acts as a non-neural confound, systematically inflating model performance estimates. The authors formally define this phenomenon and quantify its impact through controlled experiments, rigorous cross-validation, and replication analyses across multiple published studies—revealing accuracy overestimations of 4.46%–7.42%. Critically, each 1% increase in stimulus confounding elevates overestimation by 0.26%. This bias undermines the validity of key neuroscientific conclusions and risks misinforming pseudoscientific claims. The work establishes stimulus independence as a methodological imperative for robust neural decoding evaluation and introduces a practical validation framework—including stimulus-shuffled baselines and correction protocols—to mitigate confounding. By doing so, it advances EEG decoding research toward greater rigor, transparency, and reproducibility.
📝 Abstract
In neural-decoding studies, recordings of participants' responses to stimuli are used to train models. In recent years, there has been an explosion of publications detailing applications of innovations from deep-learning research to neural-decoding studies. The data-hungry models used in these experiments have resulted in a demand for increasingly large datasets. Consequently, in some studies, the same stimuli are presented multiple times to each participant to increase the number of trials available for use in model training. However, when a decoding model is trained and subsequently evaluated on responses to the same stimuli, stimulus identity becomes a confounder for accuracy. We term this the repeated-stimulus confound. We identify a susceptible dataset, and 16 publications which report model performance based on evaluation procedures affected by the confound. We conducted experiments using models from the affected studies to investigate the likely extent to which results in the literature have been misreported. Our findings suggest that the decoding accuracies of these models were overestimated by between 4.46-7.42%. Our analysis also indicates that per 1% increase in accuracy under the confound, the magnitude of the overestimation increases by 0.26%. The confound not only results in optimistic estimates of decoding performance, but undermines the validity of several claims made within the affected publications. We conducted further experiments to investigate the implications of the confound in alternative contexts. We found that the same methodology used within the affected studies could also be used to justify an array of pseudoscientific claims, such as the existence of extrasensory perception.