Revealing Key Details to See Differences: A Novel Prototypical Perspective for Skeleton-based Action Recognition

๐Ÿ“… 2024-11-28
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 2
โœจ Influential: 1
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address the challenge of distinguishing highly similar joint trajectories in skeleton-based action recognition, this paper proposes ProtoGCN, a framework that models fine-grained motion discrepancies within local skeletal components. Methodologically, it introduces learnable motion prototypesโ€”first-of-its-kind representations that dynamically decompose skeleton sequences into discriminative action units. Prototype reconstruction and contrastive learning are jointly employed to enhance feature discriminability. Built upon a graph convolutional network (GCN), ProtoGCN seamlessly integrates prototype learning, local motion modeling, and reconstruction-guided optimization. Extensive experiments demonstrate state-of-the-art performance on four major benchmarks: NTU RGB+D, NTU RGB+D 120, Kinetics-Skeleton, and FineGYM. The framework significantly improves fine-grained action discrimination, validating both the effectiveness and generalizability of motion prototyping as a novel decomposition paradigm for skeleton-based action recognition.

Technology Category

Application Category

๐Ÿ“ Abstract
In skeleton-based action recognition, a key challenge is distinguishing between actions with similar trajectories of joints due to the lack of image-level details in skeletal representations. Recognizing that the differentiation of similar actions relies on subtle motion details in specific body parts, we direct our approach to focus on the fine-grained motion of local skeleton components. To this end, we introduce ProtoGCN, a Graph Convolutional Network (GCN)-based model that breaks down the dynamics of entire skeleton sequences into a combination of learnable prototypes representing core motion patterns of action units. By contrasting the reconstruction of prototypes, ProtoGCN can effectively identify and enhance the discriminative representation of similar actions. Without bells and whistles, ProtoGCN achieves state-of-the-art performance on multiple benchmark datasets, including NTU RGB+D, NTU RGB+D 120, Kinetics-Skeleton, and FineGYM, which demonstrates the effectiveness of the proposed method. The code is available at https://github.com/firework8/ProtoGCN.
Problem

Research questions and friction points this paper is trying to address.

Distinguishing similar actions in skeleton-based recognition
Focusing on fine-grained motion of local skeleton components
Enhancing discriminative representation using learnable motion prototypes
Innovation

Methods, ideas, or system contributions that make the work stand out.

ProtoGCN uses Graph Convolutional Network for action recognition.
Focuses on fine-grained motion of local skeleton components.
Achieves state-of-the-art performance on multiple benchmark datasets.
๐Ÿ”Ž Similar Papers
No similar papers found.
Hongda Liu
Hongda Liu
Sun Yat-sen University
Computer VisionLow-level VisionImage RestorationStyle Transfer
Y
Yunfan Liu
University of Chinese Academy of Sciences
Min Ren
Min Ren
Continental Advanced Lidar Solutions US, LLC
PhotonicsAvalanche PhotodiodesSingle Photon DectectorsSingle Photon DetectionLidar
H
Hao Wang
NLPR, Institute of Automation, Chinese Academy of Sciences; University of Chinese Academy of Sciences
Y
Yunlong Wang
NLPR, Institute of Automation, Chinese Academy of Sciences
Z
Zhe Sun
NLPR, Institute of Automation, Chinese Academy of Sciences