🤖 AI Summary
In gait recognition, unordered set modeling neglects short-term temporal dependencies, while ordered sequence modeling struggles to capture long-range correlations. To address this, we propose the “gait snippet” paradigm, representing human gait as a personalized, multi-scale composition of action snippets—thereby unifying short- and long-range temporal context modeling. Our method comprises two core components: snippet sampling and snippet modeling, leveraging a lightweight 2D convolutional backbone for efficient snippet-level feature extraction and aggregation. This work introduces the snippet concept to gait recognition for the first time, breaking away from the conventional dichotomy of set- versus sequence-based modeling. Evaluated on Gait3D and GREW benchmarks, our approach achieves rank-1 accuracies of 77.5% and 81.7%, respectively, demonstrating strong effectiveness, robustness, and cross-scenario generalization capability.
📝 Abstract
Recent advancements in gait recognition have significantly enhanced performance by treating silhouettes as either an unordered set or an ordered sequence. However, both set-based and sequence-based approaches exhibit notable limitations. Specifically, set-based methods tend to overlook short-range temporal context for individual frames, while sequence-based methods struggle to capture long-range temporal dependencies effectively. To address these challenges, we draw inspiration from human identification and propose a new perspective that conceptualizes human gait as a composition of individualized actions. Each action is represented by a series of frames, randomly selected from a continuous segment of the sequence, which we term a snippet. Fundamentally, the collection of snippets for a given sequence enables the incorporation of multi-scale temporal context, facilitating more comprehensive gait feature learning. Moreover, we introduce a non-trivial solution for snippet-based gait recognition, focusing on Snippet Sampling and Snippet Modeling as key components. Extensive experiments on four widely-used gait datasets validate the effectiveness of our proposed approach and, more importantly, highlight the potential of gait snippets. For instance, our method achieves the rank-1 accuracy of 77.5% on Gait3D and 81.7% on GREW using a 2D convolution-based backbone.