Unsupervised Skeleton-Based Action Segmentation via Hierarchical Spatiotemporal Vector Quantization

📅 2026-04-16

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

This work addresses the challenges of segment length bias and insufficient spatiotemporal modeling in unsupervised skeleton-based action segmentation by proposing a hierarchical spatiotemporal vector quantization framework. The method employs a two-stage vector quantization process: first mapping raw skeleton sequences into fine-grained sub-action units, then aggregating these units into action-level representations. By jointly reconstructing both the skeleton data and their corresponding timestamps, the model enables end-to-end unsupervised spatiotemporal learning. To the best of our knowledge, this is the first effort to introduce hierarchical vector quantization to this task, effectively integrating spatial and temporal cues and substantially mitigating segment length bias. The approach achieves state-of-the-art performance across multiple benchmarks—including HuGaDB, LARa, and BABEL—significantly outperforming non-hierarchical baseline methods.

Technology Category

Application Category

📝 Abstract

We propose a novel hierarchical spatiotemporal vector quantization framework for unsupervised skeleton-based temporal action segmentation. We first introduce a hierarchical approach, which includes two consecutive levels of vector quantization. Specifically, the lower level associates skeletons with fine-grained subactions, while the higher level further aggregates subactions into action-level representations. Our hierarchical approach outperforms the non-hierarchical baseline, while primarily exploiting spatial cues by reconstructing input skeletons. Next, we extend our approach by leveraging both spatial and temporal information, yielding a hierarchical spatiotemporal vector quantization scheme. In particular, our hierarchical spatiotemporal approach performs multi-level clustering, while simultaneously recovering input skeletons and their corresponding timestamps. Lastly, extensive experiments on multiple benchmarks, including HuGaDB, LARa, and BABEL, demonstrate that our approach establishes a new state-of-the-art performance and reduces segment length bias in unsupervised skeleton-based temporal action segmentation.

Problem

Research questions and friction points this paper is trying to address.

unsupervised

skeleton-based

action segmentation

temporal action segmentation

vector quantization

Innovation

Methods, ideas, or system contributions that make the work stand out.

hierarchical vector quantization

spatiotemporal modeling

unsupervised action segmentation