WS-IMUBench: Can Weakly Supervised Methods from Audio, Image, and Video Be Adapted for IMU-based Temporal Action Localization?

📅 2026-02-02

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

This work addresses the bottleneck in IMU-based temporal action localization caused by reliance on expensive frame-level annotations by establishing the first reproducible benchmark for sequence-level weakly supervised learning. The study systematically evaluates the transferability of seven representative methods across seven public IMU datasets, covering key components such as proposal generation, temporal modeling, and boundary prediction. Based on over 3,540 training runs and 7,080 inference trials, the findings indicate that temporal-domain methods exhibit more stable transfer performance, and weak supervision achieves competitive results on long-duration actions and high-dimensional sensor data. The analysis further identifies short actions, temporal ambiguity, and poor proposal quality as primary sources of failure, offering clear directions for future research.

Technology Category

Application Category

📝 Abstract

IMU-based Human Activity Recognition (HAR) has enabled a wide range of ubiquitous computing applications, yet its dominant clip classification paradigm cannot capture the rich temporal structure of real-world behaviors. This motivates a shift toward IMU Temporal Action Localization (IMU-TAL), which predicts both action categories and their start/end times in continuous streams. However, current progress is strongly bottlenecked by the need for dense, frame-level boundary annotations, which are costly and difficult to scale. To address this bottleneck, we introduce WS-IMUBench, a systematic benchmark study of weakly supervised IMU-TAL (WS-IMU-TAL) under only sequence-level labels. Rather than proposing a new localization algorithm, we evaluate how well established weakly supervised localization paradigms from audio, image, and video transfer to IMU-TAL under only sequence-level labels. We benchmark seven representative weakly supervised methods on seven public IMU datasets, resulting in over 3,540 model training runs and 7,080 inference evaluations. Guided by three research questions on transferability, effectiveness, and insights, our findings show that (i) transfer is modality-dependent, with temporal-domain methods generally more stable than image-derived proposal-based approaches; (ii) weak supervision can be competitive on favorable datasets (e.g., with longer actions and higher-dimensional sensing); and (iii) dominant failure modes arise from short actions, temporal ambiguity, and proposal quality. Finally, we outline concrete directions for advancing WS-IMU-TAL (e.g., IMU-specific proposal generation, boundary-aware objectives, and stronger temporal reasoning). Beyond individual results, WS-IMUBench establishes a reproducible benchmarking template, datasets, protocols, and analyses, to accelerate community-wide progress toward scalable WS-IMU-TAL.

Problem

Research questions and friction points this paper is trying to address.

IMU-based Temporal Action Localization

Weakly Supervised Learning

Sequence-level Labels

Human Activity Recognition

Temporal Action Boundary

Innovation

Methods, ideas, or system contributions that make the work stand out.

Weakly Supervised Learning

IMU-based Temporal Action Localization

Benchmarking