Point-Supervised Skeleton-Based Human Action Segmentation

📅 2026-03-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of fully supervised approaches in skeleton-based action segmentation, which rely on costly frame-level annotations and are sensitive to ambiguous action boundaries. To mitigate these issues, the paper introduces a point-supervised paradigm that requires only a single annotated frame per action segment. By integrating multimodal features—joint, bone, and motion—and leveraging a pretrained unified model, prototype-based similarity metrics, an energy function, and constrained K-Medoids clustering, the method generates high-quality pseudo-labels and effectively fuses multimodal information. The proposed approach establishes new state-of-the-art results on the PKU-MMD (X-Sub/X-View), MCFS-22, and MCFS-130 datasets, achieving performance comparable to or even surpassing that of several fully supervised methods while substantially reducing annotation costs.

Technology Category

Application Category

📝 Abstract
Skeleton-based temporal action segmentation is a fundamental yet challenging task, playing a crucial role in enabling intelligent systems to perceive and respond to human activities. While fully-supervised methods achieve satisfactory performance, they require costly frame-level annotations and are sensitive to ambiguous action boundaries. To address these issues, we introduce a point-supervised framework for skeleton-based action segmentation, where only a single frame per action segment is labeled. We leverage multimodal skeleton data, including joint, bone, and motion information, encoded via a pretrained unified model to extract rich feature representations. To generate reliable pseudo-labels, we propose a novel prototype similarity method and integrate it with two existing methods: energy function and constrained K-Medoids clustering. Multimodal pseudo-label integration is proposed to enhance the reliability of the pseudo-label and guide the model training. We establish new benchmarks on PKU-MMD (X-Sub and X-View), MCFS-22, and MCFS-130, and implement baselines for point-supervised skeleton-based human action segmentation. Extensive experiments show that our method achieves competitive performance, even surpassing some fully-supervised methods while significantly reducing annotation effort.
Problem

Research questions and friction points this paper is trying to address.

skeleton-based action segmentation
point-supervised learning
temporal action segmentation
annotation efficiency
pseudo-labeling
Innovation

Methods, ideas, or system contributions that make the work stand out.

point-supervised learning
skeleton-based action segmentation
pseudo-label generation
multimodal fusion
prototype similarity
🔎 Similar Papers
No similar papers found.
H
Hongsong Wang
Southeast University
Y
Yiqin Shen
Southeast University
P
Pengbo Yan
Southeast University
Jie Gui
Jie Gui
Southeast University, China
Pattern Recognition and Machine LearningArtificial IntelligenceData MiningDeep LearningImage Processing and Computer Vis