MoPFormer: Motion-Primitive Transformer for Wearable-Sensor Activity Recognition

📅 2025-05-27

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

To address poor interpretability and weak cross-dataset generalization in wearable-sensor-based human activity recognition (HAR), this paper proposes the Motion-Primitive Transformer (MoPFormer). Methodologically, MoPFormer introduces, for the first time, a discrete representation of motion primitives—semantically meaningful and physically interpretable elementary motion units—derived from IMU signals. It integrates context-aware embedding with masked motion modeling pretraining within a Transformer architecture to enable primitive-level temporal dependency learning and self-supervised reconstruction. Evaluated on six HAR benchmarks, MoPFormer achieves state-of-the-art performance, with substantial improvements in cross-device and cross-scenario accuracy. Ablation studies and visualization analyses confirm that motion primitives exhibit high stability and transferability, simultaneously enhancing model interpretability and generalization robustness.

Technology Category

Application Category

📝 Abstract

Human Activity Recognition (HAR) with wearable sensors is challenged by limited interpretability, which significantly impacts cross-dataset generalization. To address this challenge, we propose Motion-Primitive Transformer (MoPFormer), a novel self-supervised framework that enhances interpretability by tokenizing inertial measurement unit signals into semantically meaningful motion primitives and leverages a Transformer architecture to learn rich temporal representations. MoPFormer comprises two-stages. first stage is to partition multi-channel sensor streams into short segments and quantizing them into discrete"motion primitive"codewords, while the second stage enriches those tokenized sequences through a context-aware embedding module and then processes them with a Transformer encoder. The proposed MoPFormer can be pre-trained using a masked motion-modeling objective that reconstructs missing primitives, enabling it to develop robust representations across diverse sensor configurations. Experiments on six HAR benchmarks demonstrate that MoPFormer not only outperforms state-of-the-art methods but also successfully generalizes across multiple datasets. Most importantly, the learned motion primitives significantly enhance both interpretability and cross-dataset performance by capturing fundamental movement patterns that remain consistent across similar activities regardless of dataset origin.

Problem

Research questions and friction points this paper is trying to address.

Enhancing interpretability in wearable-sensor activity recognition

Improving cross-dataset generalization for Human Activity Recognition

Learning robust temporal representations from inertial sensor data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Tokenizes sensor signals into motion primitives

Uses Transformer for temporal representation learning

Pre-trained with masked motion-modeling objective

🔎 Similar Papers

Information Fusion in Multimodal IoT Systems for physical activity level monitoring