What Are We Actually Decoding? Source Attribution for Non-Invasive Brain-to-Language Retrieval

📅 2026-05-23

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

Current advances in non-invasive brain-to-language decoding are often confounded by structural shortcuts—such as decoder priors, embedding metrics, and signal duration—making it difficult to attribute performance gains to genuine neural activity. This work proposes the first auditable source-attribution framework that eliminates structural leakage by enforcing fixed-duration windows and separating stimulus identity. It introduces Group Context Bias (GCB) as a controllable intervention during inference to precisely quantify the effect of contextual aggregation. Evaluated on the Gwilliams and MOUS datasets, GCB improves Rank@1 accuracy from 44% to 52% and from 22% to 29%, respectively; this gain vanishes under random group perturbations, confirming its validity. These findings indicate that the primary bottleneck in current systems stems from sentence-level competition.

📝 Abstract

In non-invasive neural language decoding, results can be inflated by sources that are not stimulus-evoked neural evidence: decoder priors, embedding-based metrics, and non-neural structural nuisances such as signal duration. The methodological challenge is therefore attribution: a reported gain is more informative when it can be traced to a specific source. We recast stimulus-locked MEG-to-audio retrieval as an auditing framework that separates apparent performance into three sources - structural shortcuts, window-level stimulus-locked evidence, and cross-window contextual aggregation - and provides a diagnostic for each. Signal-blind Gaussian noise reaches 66.3% Rank@1 (R@1) under variable-length decoding but collapses to near chance once fixed-duration windows and stimulus-identity splits are enforced, isolating structural leakage. Under these controls, fixed-window retrieval recovers measurable MEG-audio discriminability, while an oracle sentence-bucket diagnostic shows that 95.7% of Top-1 errors select the wrong sentence, localising the residual bottleneck to sentence-level competition. We audit this contextual source with Group Context Bias (GCB), an inference-time additive logit bias that pools sentence-consistent evidence across windows while leaving the base retrieval scores and candidate pool fixed. Used as a score-space intervention, GCB makes the contextual source measurable: R@1 shifts from 44% to 52% on Gwilliams and from 22% to 29% on MOUS under the same fixed setting. GCB is auditable under this design: its effect collapses under random-grouping perturbations and vanishes when local evidence is attenuated in MEG or is near chance in EEG, supporting its use as a controlled source-attribution intervention. These results suggest that brain-to-language performance should be source-attributed, not merely reported.

Problem

Research questions and friction points this paper is trying to address.

source attribution

neural decoding

structural shortcuts

stimulus-locked evidence

contextual aggregation

Innovation

Methods, ideas, or system contributions that make the work stand out.

source attribution

MEG-to-audio retrieval

Group Context Bias