MoPFormer: Motion-Primitive Transformer for Wearable-Sensor Activity Recognition

📅 2025-05-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address poor interpretability and weak cross-dataset generalization in wearable-sensor-based human activity recognition (HAR), this paper proposes the Motion-Primitive Transformer (MoPFormer). Methodologically, MoPFormer introduces, for the first time, a discrete representation of motion primitives—semantically meaningful and physically interpretable elementary motion units—derived from IMU signals. It integrates context-aware embedding with masked motion modeling pretraining within a Transformer architecture to enable primitive-level temporal dependency learning and self-supervised reconstruction. Evaluated on six HAR benchmarks, MoPFormer achieves state-of-the-art performance, with substantial improvements in cross-device and cross-scenario accuracy. Ablation studies and visualization analyses confirm that motion primitives exhibit high stability and transferability, simultaneously enhancing model interpretability and generalization robustness.

Technology Category

Application Category

📝 Abstract
Human Activity Recognition (HAR) with wearable sensors is challenged by limited interpretability, which significantly impacts cross-dataset generalization. To address this challenge, we propose Motion-Primitive Transformer (MoPFormer), a novel self-supervised framework that enhances interpretability by tokenizing inertial measurement unit signals into semantically meaningful motion primitives and leverages a Transformer architecture to learn rich temporal representations. MoPFormer comprises two-stages. first stage is to partition multi-channel sensor streams into short segments and quantizing them into discrete"motion primitive"codewords, while the second stage enriches those tokenized sequences through a context-aware embedding module and then processes them with a Transformer encoder. The proposed MoPFormer can be pre-trained using a masked motion-modeling objective that reconstructs missing primitives, enabling it to develop robust representations across diverse sensor configurations. Experiments on six HAR benchmarks demonstrate that MoPFormer not only outperforms state-of-the-art methods but also successfully generalizes across multiple datasets. Most importantly, the learned motion primitives significantly enhance both interpretability and cross-dataset performance by capturing fundamental movement patterns that remain consistent across similar activities regardless of dataset origin.
Problem

Research questions and friction points this paper is trying to address.

Enhancing interpretability in wearable-sensor activity recognition
Improving cross-dataset generalization for Human Activity Recognition
Learning robust temporal representations from inertial sensor data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Tokenizes sensor signals into motion primitives
Uses Transformer for temporal representation learning
Pre-trained with masked motion-modeling objective
🔎 Similar Papers
No similar papers found.
H
Hao Zhang
Southern University of Science and Technology
Z
Zhuang Zhan
Southern University of Science and Technology, City University of Hong Kong
Xuehao Wang
Xuehao Wang
Zhejiang University
Multi-Task LearningSegment Anything ModelPEFTLLM
X
Xiaodong Yang
Institute of Computing Technology, Chinese Academy of Sciences
Y
Yu Zhang
Southern University of Science and Technology