π€ AI Summary
This study investigates whether surface electromyography (sEMG) signals can be effectively encoded across overt speech, silent reading, and imagined (silent) speech, and evaluates the suitability of different speech representations as intermediate targets. Using multivariate temporal response function (mTRF) models with elastic net regularization within a sentence-level cross-validation framework, the authors systematically compare the linear predictive performance of Speech Articulatory Features (SPARC) against phoneme one-hot encodings for modeling sEMG envelopes. Results demonstrate that SPARC significantly outperforms phoneme-based representations across nearly all electrodes and speech conditions; overt and silent reading yield comparable performance, while imagined speech remains substantially above chance. The mTRF weights exhibit consistent anatomical interpretability. This work provides the first evidence of SPARCβs robust encoding capability for sEMG across multiple speech modalities, establishing it as a preferred intermediate target for silent speech modeling.
π Abstract
We test whether Speech Articulatory Coding (SPARC) features can linearly predict surface electromyography (sEMG) envelopes across aloud, mimed, and subvocal speech in twenty-four subjects. Using elastic-net multivariate temporal response function (mTRF) with sentence-level cross-validation, SPARC yields higher prediction accuracy than phoneme one-hot representations on nearly all electrodes and in all speech modes. Aloud and mimed speech perform comparably, and subvocal speech remains above chance, indicating detectable articulatory activity. Variance partitioning shows a substantial unique contribution from SPARC and a minimal unique contribution from phoneme features. mTRF weight patterns reveal anatomically interpretable relationships between electrode sites and articulatory movements that remain consistent across modes. This study focuses on representation/encoding analysis (not end-to-end decoding) and supports SPARC as a robust and interpretable intermediate target for sEMG-based silent-speech modeling.