DoGCLR: Dominance-Game Contrastive Learning Network for Skeleton-Based Action Recognition

📅 2025-11-18

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

Existing self-supervised contrastive learning methods for skeleton-based action recognition typically treat skeletal regions uniformly and rely on FIFO queues for negative sample storage, leading to loss of motion details and suboptimal negative sample selection. To address these issues, this paper proposes a dominance-gaming–based self-supervised contrastive learning framework. First, it models dynamic dominance relationships between positive and negative samples to enhance representation discriminability and semantic consistency. Second, it introduces spatiotemporal dual-dimensional weighted region localization and region-level data augmentation to preserve critical motion structures. Third, it incorporates an entropy-driven hard-negative memory bank with dynamic updating to improve negative sample quality. Extensive experiments demonstrate state-of-the-art performance: on NTU RGB+D, improvements of 1.1% and 2.3% are achieved on NTU120 X-Sub and X-Set benchmarks, respectively; on PKU-MMD Part II, the method achieves a 1.9% gain over prior art.

Technology Category

Application Category

📝 Abstract

Existing self-supervised contrastive learning methods for skeleton-based action recognition often process all skeleton regions uniformly, and adopt a first-in-first-out (FIFO) queue to store negative samples, which leads to motion information loss and non-optimal negative sample selection. To address these challenges, this paper proposes Dominance-Game Contrastive Learning network for skeleton-based action Recognition (DoGCLR), a self-supervised framework based on game theory. DoGCLR models the construction of positive and negative samples as a dynamic Dominance Game, where both sample types interact to reach an equilibrium that balances semantic preservation and discriminative strength. Specifically, a spatio-temporal dual weight localization mechanism identifies key motion regions and guides region-wise augmentations to enhance motion diversity while maintaining semantics. In parallel, an entropy-driven dominance strategy manages the memory bank by retaining high entropy (hard) negatives and replacing low-entropy (weak) ones, ensuring consistent exposure to informative contrastive signals. Extensive experiments are conducted on NTU RGB+D and PKU-MMD datasets. On NTU RGB+D 60 X-Sub/X-View, DoGCLR achieves 81.1%/89.4% accuracy, and on NTU RGB+D 120 X-Sub/X-Set, DoGCLR achieves 71.2%/75.5% accuracy, surpassing state-of-the-art methods by 0.1%, 2.7%, 1.1%, and 2.3%, respectively. On PKU-MMD Part I/Part II, DoGCLR performs comparably to the state-of-the-art methods and achieves a 1.9% higher accuracy on Part II, highlighting its strong robustness on more challenging scenarios.

Problem

Research questions and friction points this paper is trying to address.

Addresses uniform processing and FIFO queue limitations in contrastive learning

Models sample construction as dominance game for semantic-discriminative balance

Enhances motion diversity and manages hard negatives via entropy strategy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Models sample construction as dynamic Dominance Game

Uses spatio-temporal dual weight localization mechanism

Employs entropy-driven dominance strategy for memory bank

🔎 Similar Papers

Spatial-Temporal Decoupling Contrastive Learning for Skeleton-based Human Action Recognition