🤖 AI Summary
This study investigates whether representations learned by modern EEG foundation models from raw signals align with 63 established clinical features. Employing hierarchical ridge regression probes, LEACE subspace erasure, and interpretable linear classifiers, the authors systematically analyze feature encoding and causal usage across three models and five clinical tasks, yielding 945 unique (model, task, feature) combinations. This work presents the first systematic dissection of the correspondence between internal representations in EEG foundation models and classical clinical features, identifying 50 features consistently encoded across tasks. Results demonstrate that 68.6% of the combinations exhibit causal relevance, with known features accounting for an average of 79.3% of the models’ performance advantage—reaching approximately 0.99 in depression-related tasks and around 0.56 in stress-related tasks.
📝 Abstract
Clinical electroencephalogram (EEG) analysis rests on a hand-crafted feature catalog refined over decades, \emph{e.g.,} band power, connectivity, complexity, and more. Modern EEG foundation models bypass this catalog, learn directly from raw signals via self-supervised pretraining, and match or outperform feature-engineered baselines on most clinical benchmarks. Whether the two representations align is an open question, which we decompose into three sub-questions: \emph{what does the model learn}, \emph{what does the model use}, and \emph{how much can be explained}. We answer them with layer-wise ridge probing, LEACE-style cross-covariance subspace erasure, and a transparent classifier benchmarked against a random-feature baseline. The audit covers three foundation models (CSBrain, CBraMod, LaBraM), five clinical tasks (MDD, Stress, ISRUC-Sleep, TUSL, Siena), and a 6-family 63-feature lexicon. Of the $945$ (model, task, feature) units, $648$ ($68.6\%$) are representation-causal and $199$ ($21.1\%$) are encoded-only. Across tasks, $50$ features qualify as universal candidates with strong support (all three architectures RC) in two or more tasks. Frequency-domain features dominate, but the other five families each contribute substantial causal mass. Confirmed features recover, on average, $79.3\%$ of the foundation model's advantage over the random baseline, with a clean task gradient (MDD $\approx 0.99$ down to Stress $\approx 0.56$): tasks near ceiling are almost fully recovered by the lexicon, while harder tasks leave a non-trivial residual that pinpoints a concrete target for future concept discovery.