🤖 AI Summary
To address the low recognition accuracy of conventional gait recognition methods in complex outdoor scenarios—caused by insufficient information entropy in skeletal representations—this paper proposes Parsing Skeleton (PS) representation and a multimodal framework, PSGait. PSGait introduces a novel skeleton-guided human parsing mechanism that generates high-entropy, fine-grained skeletal sequences with enhanced dynamic modeling capability. It further designs a plug-and-play multimodal fusion architecture supporting dual-stream input (silhouette images and parsed skeletons), enabling performance improvement without retraining. Key innovations include skeleton-guided feature alignment, dual-stream spatiotemporal convolution, and modality-level cascaded weighted fusion. Extensive experiments on multiple benchmark datasets demonstrate that PSGait significantly outperforms state-of-the-art multimodal approaches, achieving up to a 10.9% improvement in Rank-1 accuracy. These results validate its robustness and generalizability in real-world environments.
📝 Abstract
Gait recognition has emerged as a robust biometric modality due to its non-intrusive nature and resilience to occlusion. Conventional gait recognition methods typically rely on silhouettes or skeletons. Despite their success in gait recognition for controlled laboratory environments, they usually fail in real-world scenarios due to their limited information entropy for gait representations. To achieve accurate gait recognition in the wild, we propose a novel gait representation, named Parsing Skeleton. This representation innovatively introduces the skeleton-guided human parsing method to capture fine-grained body dynamics, so they have much higher information entropy to encode the shapes and dynamics of fine-grained human parts during walking. Moreover, to effectively explore the capability of the parsing skeleton representation, we propose a novel parsing skeleton-based gait recognition framework, named PSGait, which takes parsing skeletons and silhouettes as input. By fusing these two modalities, the resulting image sequences are fed into gait recognition models for enhanced individual differentiation. We conduct comprehensive benchmarks on various datasets to evaluate our model. PSGait outperforms existing state-of-the-art multimodal methods. Furthermore, as a plug-and-play method, PSGait leads to a maximum improvement of 10.9% in Rank-1 accuracy across various gait recognition models. These results demonstrate the effectiveness and versatility of parsing skeletons for gait recognition in the wild, establishing PSGait as a new state-of-the-art approach for multimodal gait recognition.