DepthGait: Multi-Scale Cross-Level Feature Fusion of RGB-Derived Depth and Silhouette Sequences for Robust Gait Recognition

πŸ“… 2025-08-05
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address viewpoint sensitivity and insufficient fine-grained representation in gait recognition, this paper proposes DepthGaitβ€”a novel multimodal framework that fuses temporally aligned depth maps (estimated from RGB sequences) with binary silhouette sequences. Methodologically, it employs a lightweight depth estimation network to generate depth sequences, leverages a multi-scale CNN for spatiotemporal feature extraction, and introduces a cross-level, cross-modal fusion module that explicitly aligns and complements geometric and dynamic cues from depth and silhouettes, thereby mitigating modality discrepancy. Evaluated on CASIA-B and OU-MVLP benchmarks, DepthGait achieves state-of-the-art Rank-1 accuracy, notably improving performance by up to 3.2% under large viewpoint variations (>45Β°). These results demonstrate the effectiveness of jointly enhancing fine-grained gait modeling and viewpoint robustness through complementary multimodal representation.

Technology Category

Application Category

πŸ“ Abstract
Robust gait recognition requires highly discriminative representations, which are closely tied to input modalities. While binary silhouettes and skeletons have dominated recent literature, these 2D representations fall short of capturing sufficient cues that can be exploited to handle viewpoint variations, and capture finer and meaningful details of gait. In this paper, we introduce a novel framework, termed DepthGait, that incorporates RGB-derived depth maps and silhouettes for enhanced gait recognition. Specifically, apart from the 2D silhouette representation of the human body, the proposed pipeline explicitly estimates depth maps from a given RGB image sequence and uses them as a new modality to capture discriminative features inherent in human locomotion. In addition, a novel multi-scale and cross-level fusion scheme has also been developed to bridge the modality gap between depth maps and silhouettes. Extensive experiments on standard benchmarks demonstrate that the proposed DepthGait achieves state-of-the-art performance compared to peer methods and attains an impressive mean rank-1 accuracy on the challenging datasets.
Problem

Research questions and friction points this paper is trying to address.

Enhance gait recognition using RGB-derived depth and silhouettes
Address viewpoint variations and finer gait details capture
Bridge modality gap via multi-scale cross-level fusion
Innovation

Methods, ideas, or system contributions that make the work stand out.

RGB-derived depth maps enhance gait recognition
Multi-scale cross-level fusion bridges modality gap
DepthGait achieves state-of-the-art performance
πŸ”Ž Similar Papers
No similar papers found.
X
Xinzhu Li
Sun Yat-sen University, Zhuhai, China
J
Juepeng Zheng
Sun Yat-sen University, Zhuhai, China
Y
Yikun Chen
Guangdong Zhiyun Urban Construction Technology Co., Ltd., Zhuhai, China
Xudong Mao
Xudong Mao
Sun Yat-sen University
Computer VisionDeep Learning
G
Guanghui Yue
Shenzhen University, Shenzhen, China
W
Wei Zhou
Cardiff University, Cardiff, UK
C
Chenlei Lv
Shenzhen University, Shenzhen, China
R
Ruomei Wang
Sun Yat-sen University, Zhuhai, China
F
Fan Zhou
Sun Yat-sen University, Guangzhou, China
Baoquan Zhao
Baoquan Zhao
Sun Yat-sen University
3D point cloud processing and compressionMultimedia content analysisOpen Educational Resources