WS-IMUBench: Can Weakly Supervised Methods from Audio, Image, and Video Be Adapted for IMU-based Temporal Action Localization?

📅 2026-02-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the bottleneck in IMU-based temporal action localization caused by reliance on expensive frame-level annotations by establishing the first reproducible benchmark for sequence-level weakly supervised learning. The study systematically evaluates the transferability of seven representative methods across seven public IMU datasets, covering key components such as proposal generation, temporal modeling, and boundary prediction. Based on over 3,540 training runs and 7,080 inference trials, the findings indicate that temporal-domain methods exhibit more stable transfer performance, and weak supervision achieves competitive results on long-duration actions and high-dimensional sensor data. The analysis further identifies short actions, temporal ambiguity, and poor proposal quality as primary sources of failure, offering clear directions for future research.

Technology Category

Application Category

📝 Abstract
IMU-based Human Activity Recognition (HAR) has enabled a wide range of ubiquitous computing applications, yet its dominant clip classification paradigm cannot capture the rich temporal structure of real-world behaviors. This motivates a shift toward IMU Temporal Action Localization (IMU-TAL), which predicts both action categories and their start/end times in continuous streams. However, current progress is strongly bottlenecked by the need for dense, frame-level boundary annotations, which are costly and difficult to scale. To address this bottleneck, we introduce WS-IMUBench, a systematic benchmark study of weakly supervised IMU-TAL (WS-IMU-TAL) under only sequence-level labels. Rather than proposing a new localization algorithm, we evaluate how well established weakly supervised localization paradigms from audio, image, and video transfer to IMU-TAL under only sequence-level labels. We benchmark seven representative weakly supervised methods on seven public IMU datasets, resulting in over 3,540 model training runs and 7,080 inference evaluations. Guided by three research questions on transferability, effectiveness, and insights, our findings show that (i) transfer is modality-dependent, with temporal-domain methods generally more stable than image-derived proposal-based approaches; (ii) weak supervision can be competitive on favorable datasets (e.g., with longer actions and higher-dimensional sensing); and (iii) dominant failure modes arise from short actions, temporal ambiguity, and proposal quality. Finally, we outline concrete directions for advancing WS-IMU-TAL (e.g., IMU-specific proposal generation, boundary-aware objectives, and stronger temporal reasoning). Beyond individual results, WS-IMUBench establishes a reproducible benchmarking template, datasets, protocols, and analyses, to accelerate community-wide progress toward scalable WS-IMU-TAL.
Problem

Research questions and friction points this paper is trying to address.

IMU-based Temporal Action Localization
Weakly Supervised Learning
Sequence-level Labels
Human Activity Recognition
Temporal Action Boundary
Innovation

Methods, ideas, or system contributions that make the work stand out.

Weakly Supervised Learning
IMU-based Temporal Action Localization
Benchmarking
Transferability
Sequence-level Labels
🔎 Similar Papers
No similar papers found.
P
Pei Li
School of Software Engineering, Xi’an Jiaotong University, China
J
Jiaxi Yin
School of Software Engineering, Xi’an Jiaotong University, China
L
Lei Ouyang
School of Software Engineering, Xi’an Jiaotong University, China
S
Shihan Pan
School of Software Engineering, Xi’an Jiaotong University, China
Ge Wang
Ge Wang
Associate Professor of Music (also Computer Science), Stanford University
Artful DesignComputer MusicInteraction DesignLaptop OrchestraMusic Programming Language Design
H
Han Ding
School of Computer Science and Technology, Xi’an Jiaotong University, China
Fei Wang
Fei Wang
Xi'an Jiaotong University
computer visionartificial intelligence