Walking Further: Semantic-aware Multimodal Gait Recognition Under Long-Range Conditions

📅 2026-03-14

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

Existing gait recognition methods are largely confined to short-range, single-modality scenarios and struggle to address the challenges of long-range and cross-range recognition in real-world settings. To bridge this gap, this work introduces LRGait, the first LiDAR-camera multimodal benchmark specifically designed for long-range gait recognition, along with an end-to-end framework named EMGaitNet. EMGaitNet effectively mitigates the modality gap between 2D visual and 3D geometric features through CLIP-driven semantic mining, semantic-guided cross-modal alignment, and a symmetric cross-attention fusion mechanism, while incorporating spatiotemporal dynamic modeling to enhance robustness. Extensive experiments demonstrate that EMGaitNet significantly outperforms state-of-the-art methods across multiple datasets, achieving particularly superior performance in long-range and cross-range scenarios.

Technology Category

Application Category

📝 Abstract

Gait recognition is an emerging biometric technology that enables non-intrusive and hard-to-spoof human identification. However, most existing methods are confined to short-range, unimodal settings and fail to generalize to long-range and cross-distance scenarios under real-world conditions. To address this gap, we present \textbf{LRGait}, the first LiDAR-Camera multimodal benchmark designed for robust long-range gait recognition across diverse outdoor distances and environments. We further propose \textbf{EMGaitNet}, an end-to-end framework tailored for long-range multimodal gait recognition. To bridge the modality gap between RGB images and point clouds, we introduce a semantic-guided fusion pipeline. A CLIP-based Semantic Mining (SeMi) module first extracts human body-part-aware semantic cues, which are then employed to align 2D and 3D features via a Semantic-Guided Alignment (SGA) module within a unified embedding space. A Symmetric Cross-Attention Fusion (SCAF) module hierarchically integrates visual contours and 3D geometric features, and a Spatio-Temporal (ST) module captures global gait dynamics. Extensive experiments on various gait datasets validate the effectiveness of our method.

Problem

Research questions and friction points this paper is trying to address.

gait recognition

long-range

multimodal

cross-distance

biometric identification

Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal gait recognition

long-range perception

semantic-guided fusion