Mesh-Gait: A Unified Framework for Gait Recognition Through Multi-Modal Representation Learning from 2D Silhouettes

📅 2025-10-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address performance degradation in gait recognition caused by viewpoint variations, occlusions, and noise, this paper proposes an efficient and robust multimodal framework that bypasses explicit 3D mesh or skeleton estimation. Instead, it directly reconstructs voxelized 3D heatmaps from single-frame 2D silhouettes as an intermediate representation, integrating geometric priors with appearance features. The method employs an end-to-end deep network jointly optimizing 2D silhouette encoding and 3D heatmap reconstruction, augmented with supervised losses for joint localization, virtual marker placement, and mesh reconstruction. Evaluated on CASIA-B and OU-MVLP benchmarks, it achieves state-of-the-art recognition accuracy—outperforming the best prior methods by an average of 2.3%—while maintaining real-time inference at 32 FPS. The core innovation lies in a lightweight 3D heatmap representation, enabling efficient and structured mapping from 2D input to semantically rich 3D pose representations.

Technology Category

Application Category

📝 Abstract
Gait recognition, a fundamental biometric technology, leverages unique walking patterns for individual identification, typically using 2D representations such as silhouettes or skeletons. However, these methods often struggle with viewpoint variations, occlusions, and noise. Multi-modal approaches that incorporate 3D body shape information offer improved robustness but are computationally expensive, limiting their feasibility for real-time applications. To address these challenges, we introduce Mesh-Gait, a novel end-to-end multi-modal gait recognition framework that directly reconstructs 3D representations from 2D silhouettes, effectively combining the strengths of both modalities. Compared to existing methods, directly learning 3D features from 3D joints or meshes is complex and difficult to fuse with silhouette-based gait features. To overcome this, Mesh-Gait reconstructs 3D heatmaps as an intermediate representation, enabling the model to effectively capture 3D geometric information while maintaining simplicity and computational efficiency. During training, the intermediate 3D heatmaps are gradually reconstructed and become increasingly accurate under supervised learning, where the loss is calculated between the reconstructed 3D joints, virtual markers, and 3D meshes and their corresponding ground truth, ensuring precise spatial alignment and consistent 3D structure. Mesh-Gait extracts discriminative features from both silhouettes and reconstructed 3D heatmaps in a computationally efficient manner. This design enables the model to capture spatial and structural gait characteristics while avoiding the heavy overhead of direct 3D reconstruction from RGB videos, allowing the network to focus on motion dynamics rather than irrelevant visual details. Extensive experiments demonstrate that Mesh-Gait achieves state-of-the-art accuracy. The code will be released upon acceptance of the paper.
Problem

Research questions and friction points this paper is trying to address.

Addressing gait recognition challenges from viewpoint variations and occlusions
Overcoming computational inefficiency in multi-modal 3D gait recognition methods
Enabling effective fusion of 3D geometric information with silhouette features
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reconstructs 3D heatmaps from 2D silhouettes
Combines silhouette and 3D features efficiently
Uses supervised learning for precise 3D alignment
🔎 Similar Papers
No similar papers found.