Spatio-Temporal Joint Density Driven Learning for Skeleton-Based Action Recognition

πŸ“… 2025-05-29
πŸ›οΈ IEEE Transactions on Biometrics Behavior and Identity Science
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing skeleton-based action recognition methods often neglect the discriminative synergy between dynamic (moving) and static (supporting) joints, limiting representation learning. Method: This paper introduces Spatio-Temporal Joint Density (STJD), a novel metric that explicitly models the co-evolutionary relationship between dynamic and static joints and adaptively identifies discriminative β€œprincipal joints.” Building upon STJD, we propose STJD-CL, a contrastive learning strategy, and STJD-MP, a reconstruction-augmented framework, enabling motion-static interaction-driven self-supervised representation learning. Our approach employs a graph convolutional network backbone. Results: On NTU RGB+D 120, it achieves state-of-the-art performance, outperforming prior methods by 3.5% and 3.6% in the cross-subject and cross-setup protocols, respectively. It also consistently surpasses all competitors on NTU-60 and PKUMMD. These results demonstrate the effectiveness and generalizability of modeling dynamic-static joint synergy for skeleton-based action recognition.

Technology Category

Application Category

πŸ“ Abstract
Traditional approaches in unsupervised or self supervised learning for skeleton-based action classification have concentrated predominantly on the dynamic aspects of skeletal sequences. Yet, the intricate interaction between the moving and static elements of the skeleton presents a rarely tapped discriminative potential for action classification. This paper introduces a novel measurement, referred to as spatial-temporal joint density (STJD), to quantify such interaction. Tracking the evolution of this density throughout an action can effectively identify a subset of discriminative moving and/or static joints termed"prime joints"to steer self-supervised learning. A new contrastive learning strategy named STJD-CL is proposed to align the representation of a skeleton sequence with that of its prime joints while simultaneously contrasting the representations of prime and nonprime joints. In addition, a method called STJD-MP is developed by integrating it with a reconstruction-based framework for more effective learning. Experimental evaluations on the NTU RGB+D 60, NTU RGB+D 120, and PKUMMD datasets in various downstream tasks demonstrate that the proposed STJD-CL and STJD-MP improved performance, particularly by 3.5 and 3.6 percentage points over the state-of-the-art contrastive methods on the NTU RGB+D 120 dataset using X-sub and X-set evaluations, respectively.
Problem

Research questions and friction points this paper is trying to address.

Quantify spatial-temporal joint density for skeleton action recognition
Identify discriminative prime joints for self-supervised learning
Improve contrastive learning performance on skeleton-based datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Spatial-temporal joint density (STJD) measures skeleton interaction
STJD-CL aligns skeleton with prime joints contrastively
STJD-MP integrates reconstruction for enhanced learning
πŸ”Ž Similar Papers
No similar papers found.
S
Shanaka Ramesh Gunasekara
Advanced Multimedia Research Lab, University of Wollongong, Australia
Wanqing Li
Wanqing Li
Professor, University of Wollongong
Multimedia UnderstandingComputer VisionMachine Learning
P
P. Ogunbona
Advanced Multimedia Research Lab, University of Wollongong, Australia
Jack Yang
Jack Yang
Senior Lecturer, University of New South Wales
Computational Material Science