Language-Guided and Motion-Aware Gait Representation for Generalizable Recognition

πŸ“… 2026-01-17
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing gait recognition methods are highly susceptible to static nuisances such as clothing variations and struggle to effectively model dynamic motion characteristics, leading to limited generalization. To address this, this work proposes LMGait, a novel framework that, for the first time, integrates language-guided mechanisms with motion-aware representation learning. By leveraging gait-related textual prompts to steer attention, the model is guided to focus on key dynamic regions, thereby constructing representations that are robust to static interference yet sensitive to motion changes. This approach substantially mitigates overfitting to static noise and significantly improves both accuracy and generalization performance in challenging cross-scenario gait recognition settings.

Technology Category

Application Category

πŸ“ Abstract
Gait recognition is emerging as a promising technology and an innovative field within computer vision, with a wide range of applications in remote human identification. However, existing methods typically rely on complex architectures to directly extract features from images and apply pooling operations to obtain sequence-level representations. Such designs often lead to overfitting on static noise (e.g., clothing), while failing to effectively capture dynamic motion regions, such as the arms and legs. This bottleneck is particularly challenging in the presence of intra-class variation, where gait features of the same individual under different environmental conditions are significantly distant in the feature space. To address the above challenges, we present a Languageguided and Motion-aware gait recognition framework, named LMGait. To the best of our knowledge, LMGait is the first method to introduce natural language descriptions as explicit semantic priors into the gait recognition task. In particular, we utilize designed gait-related language cues to capture key motion features in gait sequences. To improve cross-modal alignment, we propose the Motion Awareness Module (MAM), which refines the language features by adaptively adjusting various levels of semantic information to ensure better alignment with the visual representations. Furthermore, we introduce the Motion Temporal Capture Module (MTCM) to enhance the discriminative capability of gait features and improve the model's motion tracking ability. We conducted extensive experiments across multiple datasets, and the results demonstrate the significant advantages of our proposed network. Specifically, our model achieved accuracies of 88.5%, 97.1%, and 97.5% on the CCPG, SUSTech1K, and CASIAB datasets, respectively, achieving state-of-the-art performance. Homepage: https://dingwu1021.github.io/LMGait/
Problem

Research questions and friction points this paper is trying to address.

gait recognition
overfitting
static noise
dynamic motion
feature representation
Innovation

Methods, ideas, or system contributions that make the work stand out.

language-guided
motion-aware
gait recognition
generalizable representation
dynamic motion features
Zhengxian Wu
Zhengxian Wu
Tsinghua University
Computer Vision、Large Language Model
Chuanrui Zhang
Chuanrui Zhang
Tsinghua University
Computer Vision
S
Shenao Jiang
The Shenzhen International Graduate School, Tsinghua University
H
Hangrui Xu
The Shenzhen International Graduate School, Tsinghua University and School of Computer Science and Information Engineering, Hefei University of Technology
Z
Zirui Liao
The Shenzhen International Graduate School, Tsinghua University
L
Luyuan Zhang
The Shenzhen International Graduate School, Tsinghua University
Huaqiu Li
Huaqiu Li
Tsinghua University
computer visionmachine learning
P
Peng Jiao
The Shenzhen International Graduate School, Tsinghua University
H
Haoqian Wang
The Shenzhen International Graduate School, Tsinghua University