Walking Further: Semantic-aware Multimodal Gait Recognition Under Long-Range Conditions

📅 2026-03-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing gait recognition methods are largely confined to short-range, single-modality scenarios and struggle to address the challenges of long-range and cross-range recognition in real-world settings. To bridge this gap, this work introduces LRGait, the first LiDAR-camera multimodal benchmark specifically designed for long-range gait recognition, along with an end-to-end framework named EMGaitNet. EMGaitNet effectively mitigates the modality gap between 2D visual and 3D geometric features through CLIP-driven semantic mining, semantic-guided cross-modal alignment, and a symmetric cross-attention fusion mechanism, while incorporating spatiotemporal dynamic modeling to enhance robustness. Extensive experiments demonstrate that EMGaitNet significantly outperforms state-of-the-art methods across multiple datasets, achieving particularly superior performance in long-range and cross-range scenarios.

Technology Category

Application Category

📝 Abstract
Gait recognition is an emerging biometric technology that enables non-intrusive and hard-to-spoof human identification. However, most existing methods are confined to short-range, unimodal settings and fail to generalize to long-range and cross-distance scenarios under real-world conditions. To address this gap, we present \textbf{LRGait}, the first LiDAR-Camera multimodal benchmark designed for robust long-range gait recognition across diverse outdoor distances and environments. We further propose \textbf{EMGaitNet}, an end-to-end framework tailored for long-range multimodal gait recognition. To bridge the modality gap between RGB images and point clouds, we introduce a semantic-guided fusion pipeline. A CLIP-based Semantic Mining (SeMi) module first extracts human body-part-aware semantic cues, which are then employed to align 2D and 3D features via a Semantic-Guided Alignment (SGA) module within a unified embedding space. A Symmetric Cross-Attention Fusion (SCAF) module hierarchically integrates visual contours and 3D geometric features, and a Spatio-Temporal (ST) module captures global gait dynamics. Extensive experiments on various gait datasets validate the effectiveness of our method.
Problem

Research questions and friction points this paper is trying to address.

gait recognition
long-range
multimodal
cross-distance
biometric identification
Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal gait recognition
long-range perception
semantic-guided fusion
LiDAR-camera fusion
CLIP-based semantic mining
🔎 Similar Papers
No similar papers found.
Z
Zhiyang Lu
Fujian Key Laboratory of Sensing and Computing for Smart Cities, Xiamen University
W
Wen Jiang
Fujian Key Laboratory of Sensing and Computing for Smart Cities, Xiamen University
T
Tianren Wu
Fujian Key Laboratory of Sensing and Computing for Smart Cities, Xiamen University
Z
Zhichao Wang
Fujian Key Laboratory of Sensing and Computing for Smart Cities, Xiamen University
C
Changwang Zhang
OPPO Research Institute
Siqi Shen
Siqi Shen
Xiamen University
Reinforcement Learning3D Vision
Ming Cheng
Ming Cheng
Dartmouth College