Lightweight Transformers for Human Activity Recognition on Mobile Devices

📅 2022-09-22

🏛️ Personal and Ubiquitous Computing

📈 Citations: 32

✨ Influential: 7

🤖 AI Summary

To address the poor generalization and data heterogeneity in mobile Human Activity Recognition (HAR) caused by variations in device models and sensor placement, this paper introduces HART—the first lightweight, sensor-aware Transformer tailored for IMU domains. HART innovatively incorporates a sensor-dimension attention mechanism to capture channel-specific dynamics and integrates structured pruning with hardware-aware quantization to drastically reduce computational overhead while preserving temporal modeling capability. Evaluated on multiple standard HAR benchmarks, HART achieves an average 3.2% improvement in cross-device accuracy over state-of-the-art methods. It reduces model parameters and FLOPs by 30–50%, and attains inference latency under 15 ms on ARM Cortex-A76—meeting stringent real-time requirements for on-device deployment.

📝 Abstract

Human Activity Recognition (HAR) on mobile devices has shown to be achievable with lightweight neural models learned from data generated by the user's inertial measurement units (IMUs). Most approaches for instanced-based HAR have used Convolutional Neural Networks (CNNs), Long Short-Term Memory (LSTMs), or a combination of the two to achieve state-of-the-art results with real-time performances. Recently, the Transformers architecture in the language processing domain and then in the vision domain has pushed further the state-of-the-art over classical architectures. However, such Transformers architecture is heavyweight in computing resources, which is not well suited for embedded applications of HAR that can be found in the pervasive computing domain. In this study, we present Human Activity Recognition Transformer (HART), a lightweight, sensor-wise transformer architecture that has been specifically adapted to the domain of the IMUs embedded on mobile devices. Our experiments on HAR tasks with several publicly available datasets show that HART uses fewer FLoating-point Operations Per Second (FLOPS) and parameters while outperforming current state-of-the-art results. Furthermore, we present evaluations across various architectures on their performances in heterogeneous environments and show that our models can better generalize on different sensing devices or on-body positions.

Problem

Research questions and friction points this paper is trying to address.

Addressing data heterogeneity in human activity recognition

Improving robustness to device position and brand changes

Developing efficient transformer models for mobile environments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based models for HAR

Fewer parameters and operations

Robust to device and position changes

🔎 Similar Papers

No similar papers found.

Authors to Follow