From Continuous sEMG Signals to Discrete Muscle State Tokens: A Robust and Interpretable Representation Framework

📅 2026-02-27

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

This study addresses the challenges of poor robustness and limited interpretability in surface electromyography (sEMG) signal decoding, which stem from high inter-subject variability and noise sensitivity. The authors propose a physiology-inspired discretization framework for sEMG: by aligning sliding windows to the minimal muscle contraction cycle, they extract ten-dimensional time–frequency features (e.g., RMS, MDF) and apply K-means clustering to generate muscle state tokens, establishing the first physiologically driven sEMG tokenization approach. Evaluated on the newly released multi-action, multi-muscle dataset ActionEMG-43, the method achieves high cross-subject consistency (Cohen’s Kappa = 0.82 ± 0.09) and yields Top-1 action recognition accuracies of 75.5% with a Vision Transformer and 67.9% with an SVM—substantially outperforming raw-signal baselines—while reducing input dimensionality by 96% and enabling interpretable analysis of movement quality.

Technology Category

Application Category

📝 Abstract

Surface electromyography (sEMG) signals exhibit substantial inter-subject variability and are highly susceptible to noise, posing challenges for robust and interpretable decoding. To address these limitations, we propose a discrete representation of sEMG signals based on a physiology-informed tokenization framework. The method employs a sliding window aligned with the minimal muscle contraction cycle to isolate individual muscle activation events. From each window, ten time-frequency features, including root mean square (RMS) and median frequency (MDF), are extracted, and K-means clustering is applied to group segments into representative muscle-state tokens. We also introduce a large-scale benchmark dataset, ActionEMG-43, comprising 43 diverse actions and sEMG recordings from 16 major muscle groups across the body. Based on this dataset, we conduct extensive evaluations to assess the inter-subject consistency, representation capacity, and interpretability of the proposed sEMG tokens. Our results show that the token representation exhibits high inter-subject consistency (Cohen's Kappa = 0.82+-0.09), indicating that the learned tokens capture consistent and subject-independent muscle activation patterns. In action recognition tasks, models using sEMG tokens achieve Top-1 accuracies of 75.5% with ViT and 67.9% with SVM, outperforming raw-signal baselines (72.8% and 64.4%, respectively), despite a 96% reduction in input dimensionality. In movement quality assessment, the tokens intuitively reveal patterns of muscle underactivation and compensatory activation, offering interpretable insights into neuromuscular control. Together, these findings highlight the effectiveness of tokenized sEMG representations as a compact, generalizable, and physiologically meaningful feature space for applications in rehabilitation, human-machine interaction, and motor function analysis.

Problem

Research questions and friction points this paper is trying to address.

sEMG

inter-subject variability

noise susceptibility

robust decoding

interpretable representation

Innovation

Methods, ideas, or system contributions that make the work stand out.

sEMG tokenization

physiology-informed representation

muscle state tokens