DynaPURLS: Dynamic Refinement of Part-aware Representations for Skeleton-based Zero-Shot Action Recognition

📅 2025-12-12

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Existing skeleton-based zero-shot action recognition methods rely on static, class-level semantic alignment, which inadequately mitigates domain shift between seen and unseen classes and hinders fine-grained knowledge transfer. To address this, we propose a dynamic multi-scale vision–semantics matching framework: (1) a novel inference-time dynamic text feature refinement mechanism to enhance semantic alignment precision; (2) hierarchical action descriptions generated by large language models, integrated with adaptive joint grouping for structural-aware skeletal modeling; and (3) a confidence-aware class-balanced memory bank to suppress pseudo-label noise. Our method achieves new state-of-the-art performance on NTU RGB+D 60/120 and PKU-MMD, demonstrating significantly improved zero-shot generalization. The code is publicly available.

Technology Category

Application Category

📝 Abstract

Zero-shot skeleton-based action recognition (ZS-SAR) is fundamentally constrained by prevailing approaches that rely on aligning skeleton features with static, class-level semantics. This coarse-grained alignment fails to bridge the domain shift between seen and unseen classes, thereby impeding the effective transfer of fine-grained visual knowledge. To address these limitations, we introduce extbf{DynaPURLS}, a unified framework that establishes robust, multi-scale visual-semantic correspondences and dynamically refines them at inference time to enhance generalization. Our framework leverages a large language model to generate hierarchical textual descriptions that encompass both global movements and local body-part dynamics. Concurrently, an adaptive partitioning module produces fine-grained visual representations by semantically grouping skeleton joints. To fortify this fine-grained alignment against the train-test domain shift, DynaPURLS incorporates a dynamic refinement module. During inference, this module adapts textual features to the incoming visual stream via a lightweight learnable projection. This refinement process is stabilized by a confidence-aware, class-balanced memory bank, which mitigates error propagation from noisy pseudo-labels. Extensive experiments on three large-scale benchmark datasets, including NTU RGB+D 60/120 and PKU-MMD, demonstrate that DynaPURLS significantly outperforms prior art, setting new state-of-the-art records. The source code is made publicly available at https://github.com/Alchemist0754/DynaPURLS

Problem

Research questions and friction points this paper is trying to address.

Refines skeleton features for zero-shot action recognition

Aligns visual and semantic representations dynamically

Enhances generalization across seen and unseen classes

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates hierarchical textual descriptions using large language model

Adaptively partitions skeleton joints into fine-grained visual representations

Dynamically refines textual features during inference via lightweight projection

🔎 Similar Papers

No similar papers found.