Informative Sample Selection Model for Skeleton-based Action Recognition with Limited Training Samples

📅 2025-10-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high annotation cost and data scarcity in 3D skeleton-based action recognition, this paper proposes a novel active learning framework grounded in Markov Decision Processes (MDPs). Methodologically, it is the first to formulate sample selection for semi-supervised skeleton action recognition as an MDP, leveraging hyperspherical embedding for discriminative feature representation and incorporating a meta-tuning strategy to dynamically optimize the selection policy—thereby precisely identifying truly informative unlabeled sequences. Technically, the approach integrates an encoder-decoder architecture, multi-head boundary evaluation, and hyperbolic space modeling to enhance generalization under low-data regimes. Extensive experiments on three benchmarks—NTU RGB+D, NTU-120, and PKU-MMD—demonstrate that the method substantially outperforms existing active learning approaches, achieving average accuracy gains of 3.2–5.7% using only 10% labeled data, thereby effectively alleviating the annotation bottleneck.

Technology Category

Application Category

📝 Abstract
Skeleton-based human action recognition aims to classify human skeletal sequences, which are spatiotemporal representations of actions, into predefined categories. To reduce the reliance on costly annotations of skeletal sequences while maintaining competitive recognition accuracy, the task of 3D Action Recognition with Limited Training Samples, also known as semi-supervised 3D Action Recognition, has been proposed. In addition, active learning, which aims to proactively select the most informative unlabeled samples for annotation, has been explored in semi-supervised 3D Action Recognition for training sample selection. Specifically, researchers adopt an encoder-decoder framework to embed skeleton sequences into a latent space, where clustering information, combined with a margin-based selection strategy using a multi-head mechanism, is utilized to identify the most informative sequences in the unlabeled set for annotation. However, the most representative skeleton sequences may not necessarily be the most informative for the action recognizer, as the model may have already acquired similar knowledge from previously seen skeleton samples. To solve it, we reformulate Semi-supervised 3D action recognition via active learning from a novel perspective by casting it as a Markov Decision Process (MDP). Built upon the MDP framework and its training paradigm, we train an informative sample selection model to intelligently guide the selection of skeleton sequences for annotation. To enhance the representational capacity of the factors in the state-action pairs within our method, we project them from Euclidean space to hyperbolic space. Furthermore, we introduce a meta tuning strategy to accelerate the deployment of our method in real-world scenarios. Extensive experiments on three 3D action recognition benchmarks demonstrate the effectiveness of our method.
Problem

Research questions and friction points this paper is trying to address.

Selecting informative skeleton sequences for annotation with limited training samples
Improving semi-supervised 3D action recognition through active learning
Enhancing sample selection using hyperbolic space projection and meta tuning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Markov Decision Process for sample selection
Projects state-action pairs into hyperbolic space
Introduces meta tuning strategy for deployment
🔎 Similar Papers
No similar papers found.