🤖 AI Summary
This work addresses the limitations of existing skeleton-based human activity understanding methods in contrastive learning, which struggle to model inter-class structural similarities and are susceptible to noisy positive samples. To overcome these challenges, we propose ACLNet, a novel framework that explicitly captures class clustering relationships by constructing activity super-classes through affinity-based metrics. Our approach integrates dynamic temperature scheduling with a margin-aware contrastive learning strategy to enhance feature discriminability, particularly for hard samples. ACLNet supports multi-task learning within a unified architecture and achieves significant performance gains across multiple benchmarks—including NTU RGB+D 60/120, Kinetics-Skeleton, PKU-MMD, FineGYM, and CASIA-B—demonstrating consistent improvements in action recognition, gait recognition, and person re-identification.
📝 Abstract
In skeleton-based human activity understanding, existing methods often adopt the contrastive learning paradigm to construct a discriminative feature space. However, many of these approaches fail to exploit the structural inter-class similarities and overlook the impact of anomalous positive samples. In this study, we introduce ACLNet, an Affinity Contrastive Learning Network that explores the intricate clustering relationships among human activity classes to improve feature discrimination. Specifically, we propose an affinity metric to refine similarity measurements, thereby forming activity superclasses that provide more informative contrastive signals. A dynamic temperature schedule is also introduced to adaptively adjust the penalty strength for various superclasses. In addition, we employ a margin-based contrastive strategy to improve the separation of hard positive and negative samples within classes. Extensive experiments on NTU RGB+D 60, NTU RGB+D 120, Kinetics-Skeleton, PKU-MMD, FineGYM, and CASIA-B demonstrate the superiority of our method in skeleton-based action recognition, gait recognition, and person re-identification. The source code is available at https://github.com/firework8/ACLNet.