Combo-Gait: Unified Transformer Framework for Multi-Modal Gait Recognition and Attribute Analysis

📅 2025-10-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing gait recognition methods are largely confined to single-modal (2D or 3D) inputs and single-task (identity-only) learning, limiting their ability to model the geometric-dynamic coupling of gait under long-range, complex scenarios and precluding joint analysis of human attributes such as age, gender, and BMI. To address these limitations, we propose the first unified multimodal multitask gait analysis framework. It jointly encodes 2D temporal silhouettes and 3D SMPL mesh parameters within a shared Transformer architecture to enable cross-modal feature co-learning, while simultaneously optimizing identity recognition and joint attribute estimation. A dedicated multitask loss function is introduced to enhance generalization. Evaluated on the large-scale BRIAR dataset under extreme conditions—including 1-km observation distance and ±50° pitch angles—our method significantly outperforms state-of-the-art approaches, achieving both high-accuracy identity recognition and robust, joint attribute inference.

Technology Category

Application Category

📝 Abstract
Gait recognition is an important biometric for human identification at a distance, particularly under low-resolution or unconstrained environments. Current works typically focus on either 2D representations (e.g., silhouettes and skeletons) or 3D representations (e.g., meshes and SMPLs), but relying on a single modality often fails to capture the full geometric and dynamic complexity of human walking patterns. In this paper, we propose a multi-modal and multi-task framework that combines 2D temporal silhouettes with 3D SMPL features for robust gait analysis. Beyond identification, we introduce a multitask learning strategy that jointly performs gait recognition and human attribute estimation, including age, body mass index (BMI), and gender. A unified transformer is employed to effectively fuse multi-modal gait features and better learn attribute-related representations, while preserving discriminative identity cues. Extensive experiments on the large-scale BRIAR datasets, collected under challenging conditions such as long-range distances (up to 1 km) and extreme pitch angles (up to 50°), demonstrate that our approach outperforms state-of-the-art methods in gait recognition and provides accurate human attribute estimation. These results highlight the promise of multi-modal and multitask learning for advancing gait-based human understanding in real-world scenarios.
Problem

Research questions and friction points this paper is trying to address.

Combining 2D silhouettes and 3D SMPL features for robust gait analysis
Jointly performing gait recognition and human attribute estimation
Addressing gait recognition under long-range and extreme angle conditions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines 2D silhouettes with 3D SMPL features
Uses unified transformer for multi-modal fusion
Jointly performs gait recognition and attribute estimation
🔎 Similar Papers
No similar papers found.