CPEP: Contrastive Pose-EMG Pre-training Enhances Gesture Generalization on EMG Signals

📅 2025-09-04

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

Surface electromyography (sEMG) signals suffer from low signal-to-noise ratio and poor generalizability, severely limiting zero-shot gesture recognition capability. Method: We propose Cross-modal Pre-training with Explicit Alignment (CPEP), the first framework to explicitly align sEMG signals with hand skeletal pose representations via contrastive learning. CPEP employs a dual-branch architecture—comprising an EMG encoder and a pose encoder—optimized through cross-modal positive-pair alignment and negative-pair decoupling to learn modality-invariant, discriminative representations. Generalization is rigorously evaluated using linear probing and zero-shot transfer protocols. Results: CPEP achieves a 21% accuracy gain over the emg2pose baseline on in-distribution gesture classification and improves zero-shot accuracy on unseen gestures by 72%, demonstrating substantial enhancement in out-of-distribution gesture recognition capability.

Technology Category

Application Category

📝 Abstract

Hand gesture classification using high-quality structured data such as videos, images, and hand skeletons is a well-explored problem in computer vision. Leveraging low-power, cost-effective biosignals, e.g. surface electromyography (sEMG), allows for continuous gesture prediction on wearables. In this paper, we demonstrate that learning representations from weak-modality data that are aligned with those from structured, high-quality data can improve representation quality and enables zero-shot classification. Specifically, we propose a Contrastive Pose-EMG Pre-training (CPEP) framework to align EMG and pose representations, where we learn an EMG encoder that produces high-quality and pose-informative representations. We assess the gesture classification performance of our model through linear probing and zero-shot setups. Our model outperforms emg2pose benchmark models by up to 21% on in-distribution gesture classification and 72% on unseen (out-of-distribution) gesture classification.

Problem

Research questions and friction points this paper is trying to address.

Improving gesture classification using EMG signals

Aligning EMG representations with pose data

Enabling zero-shot classification on unseen gestures

Innovation

Methods, ideas, or system contributions that make the work stand out.

Contrastive pre-training aligns EMG and pose representations

EMG encoder produces pose-informative high-quality representations

Enables zero-shot classification for unseen gestures

🔎 Similar Papers

No similar papers found.