Scaling and Distilling Transformer Models for sEMG

📅 2025-07-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the dual challenges of limited training data and constrained computational resources for edge deployment in surface electromyography (sEMG) decoding—hindering model scalability—this work pioneers the adaptation of Transformer architectures to sEMG, scaling them to 110 million parameters to substantially enhance cross-subject generalization. We introduce architecture optimizations tailored to sEMG’s temporal dynamics and a knowledge distillation strategy that compresses the model by 50× (to 2% of its original size) with <1.5% accuracy degradation. Evaluated on multiple benchmark datasets, our compressed model achieves state-of-the-art cross-subject decoding performance, outperforming existing methods. This work establishes a deployable large-model paradigm for high-accuracy, low-latency real-time sEMG-based human–machine interfaces.

Technology Category

Application Category

📝 Abstract
Surface electromyography (sEMG) signals offer a promising avenue for developing innovative human-computer interfaces by providing insights into muscular activity. However, the limited volume of training data and computational constraints during deployment have restricted the investigation of scaling up the model size for solving sEMG tasks. In this paper, we demonstrate that vanilla transformer models can be effectively scaled up on sEMG data and yield improved cross-user performance up to 110M parameters, surpassing the model size regime investigated in other sEMG research (usually <10M parameters). We show that >100M-parameter models can be effectively distilled into models 50x smaller with minimal loss of performance (<1.5% absolute). This results in efficient and expressive models suitable for complex real-time sEMG tasks in real-world environments.
Problem

Research questions and friction points this paper is trying to address.

Scaling transformer models for sEMG tasks with limited data
Distilling large models into smaller ones efficiently
Improving cross-user performance in real-time sEMG applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Scale transformer models to 110M parameters
Distill large models into 50x smaller ones
Maintain performance with minimal loss (<1.5%)
🔎 Similar Papers
No similar papers found.
N
Nicholas Mehlman
Viterbi School of Engineering, University of Southern California
J
Jean-Christophe Gagnon-Audet
Meta FAIR
Michael Shvartsman
Michael Shvartsman
Research Scientist, Meta Reality Labs Research
Computational cognitive science and machine learning for neuroscience
K
Kelvin Niu
Meta FAIR
A
Alexander H. Miller
Meta FAIR
Shagun Sodhani
Shagun Sodhani
Google DeepMind
Machine LearningReinforcement LearningLifelong Learning