🤖 AI Summary
This study addresses the limited trustworthiness of EEG foundation models (EEG-FMs) in clinical diagnosis and brain–computer interfaces due to their opaque decision-making. For the first time, the authors extend attention-aware Layerwise Relevance Propagation (LRP) to Transformer-based EEG-FMs, enabling post-hoc attribution analysis. The proposed method not only identifies "Clever Hans" behaviors—where models rely on spurious signal correlations—but also uncovers paradigm confounds in motor imagery tasks and reveals a stable association between central electrode clusters and arousal levels in affective prediction. These findings demonstrate that the LRP framework simultaneously supports model validation and hypothesis generation in neuroscience, offering a novel pathway toward interpretable EEG-FMs and scientific discovery.
📝 Abstract
Emerging foundation models (FMs) in electroencephalography (EEG) promise a path to scale deep learning in diagnostics and brain-computer interfaces despite data scarcity, yet their opaque nature remains a barrier to wider adoption. We investigate attention-aware Layer-wise relevance propagation (LRP) as a post-hoc attribution method for EEG-FMs, extending LRP's use on convolutional neural network (CNN)-based EEG models to the Transformer architectures that current FMs are based on. We find that LRP can both verify EEG-FM decisions and surface novel, biologically plausible hypotheses from them. In motor imagery, it unmasks 'Clever Hans' behavior where models prioritize task correlated ocular signals over the intended motor correlates. In a naturalistic paradigm for affect prediction, it reveals a recurring reliance on a central electrode cluster, suggesting a candidate sensorimotor signature of arousal. Though heatmap interpretation remains ambiguous in this complex domain, the results position LRP as a tool for both verification and exploration of EEG-FMs, a role that will grow in both importance and discovery potential as the underlying models mature.