Revealing Key Details to See Differences: A Novel Prototypical Perspective for Skeleton-based Action Recognition

📅 2024-11-28

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 1

career value

199K/year

🤖 AI Summary

To address the challenge of distinguishing highly similar joint trajectories in skeleton-based action recognition, this paper proposes ProtoGCN, a framework that models fine-grained motion discrepancies within local skeletal components. Methodologically, it introduces learnable motion prototypes—first-of-its-kind representations that dynamically decompose skeleton sequences into discriminative action units. Prototype reconstruction and contrastive learning are jointly employed to enhance feature discriminability. Built upon a graph convolutional network (GCN), ProtoGCN seamlessly integrates prototype learning, local motion modeling, and reconstruction-guided optimization. Extensive experiments demonstrate state-of-the-art performance on four major benchmarks: NTU RGB+D, NTU RGB+D 120, Kinetics-Skeleton, and FineGYM. The framework significantly improves fine-grained action discrimination, validating both the effectiveness and generalizability of motion prototyping as a novel decomposition paradigm for skeleton-based action recognition.

Technology Category

Application Category

📝 Abstract

In skeleton-based action recognition, a key challenge is distinguishing between actions with similar trajectories of joints due to the lack of image-level details in skeletal representations. Recognizing that the differentiation of similar actions relies on subtle motion details in specific body parts, we direct our approach to focus on the fine-grained motion of local skeleton components. To this end, we introduce ProtoGCN, a Graph Convolutional Network (GCN)-based model that breaks down the dynamics of entire skeleton sequences into a combination of learnable prototypes representing core motion patterns of action units. By contrasting the reconstruction of prototypes, ProtoGCN can effectively identify and enhance the discriminative representation of similar actions. Without bells and whistles, ProtoGCN achieves state-of-the-art performance on multiple benchmark datasets, including NTU RGB+D, NTU RGB+D 120, Kinetics-Skeleton, and FineGYM, which demonstrates the effectiveness of the proposed method. The code is available at https://github.com/firework8/ProtoGCN.

Problem

Research questions and friction points this paper is trying to address.

Distinguishing similar actions in skeleton-based recognition

Focusing on fine-grained motion of local skeleton components

Enhancing discriminative representation using learnable motion prototypes

Innovation

Methods, ideas, or system contributions that make the work stand out.

ProtoGCN uses Graph Convolutional Network for action recognition.

Focuses on fine-grained motion of local skeleton components.

Achieves state-of-the-art performance on multiple benchmark datasets.

🔎 Similar Papers

Spatial-Temporal Decoupling Contrastive Learning for Skeleton-based Human Action Recognition