Parts-Mamba: Augmenting Joint Context with Part-Level Scanning for Occluded Human Skeleton

📅 2025-11-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing graph convolutional network (GCN) models for skeleton-based action recognition suffer from performance degradation under joint/frame occlusion or transmission loss, primarily due to insufficient modeling of long-range spatial dependencies. To address this, we propose a hybrid architecture integrating graph convolution with the Mamba state-space model. Our key contributions are: (1) a part-level scanning mechanism that structurally traverses joint sequences according to anatomical parts (e.g., arms, legs), explicitly capturing non-local spatial correlations among distant joints; and (2) a part-to-whole fusion module that jointly optimizes local part-specific features and global skeletal representations. Evaluated on NTU RGB+D 60/120 under diverse occlusion settings, our method significantly improves robustness to incomplete skeletons, achieving up to a 12.9% absolute accuracy gain over baselines—demonstrating its effectiveness in modeling corrupted or partial skeletal data.

Technology Category

Application Category

📝 Abstract
Skeleton action recognition involves recognizing human action from human skeletons. The use of graph convolutional networks (GCNs) has driven major advances in this recognition task. In real-world scenarios, the captured skeletons are not always perfect or complete because of occlusions of parts of the human body or poor communication quality, leading to missing parts in skeletons or videos with missing frames. In the presence of such non-idealities, existing GCN models perform poorly due to missing local context. To address this limitation, we propose Parts-Mamba, a hybrid GCN-Mamba model designed to enhance the ability to capture and maintain contextual information from distant joints. The proposed Parts-Mamba model effectively captures part-specific information through its parts-specific scanning feature and preserves non-neighboring joint context via a parts-body fusion module. Our proposed model is evaluated on the NTU RGB+D 60 and NTU RGB+D 120 datasets under different occlusion settings, achieving up to 12.9% improvement in accuracy.
Problem

Research questions and friction points this paper is trying to address.

Recognizing human actions from incomplete skeletons due to occlusions
Addressing poor performance of GCN models with missing local context
Enhancing contextual information capture from distant joints in occluded scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid GCN-Mamba model for occluded skeleton recognition
Part-specific scanning to capture distant joint context
Parts-body fusion module preserving non-neighboring joint information
🔎 Similar Papers
No similar papers found.