🤖 AI Summary
Existing large vision model (LVM)-based gait recognition methods heavily rely on handcrafted gait priors, overlooking the intrinsic potential of LVMs’ hierarchical representations. This work proposes a general baseline framework that requires no strong gait-specific priors and, for the first time, systematically reveals the cross-layer complementarity of intermediate-layer representations in LVMs for gait recognition. Our method leverages layered fusion, multi-source transfer learning, feature disentanglement, and ensemble strategies to efficiently exploit LVMs’ hierarchical semantic structure. Evaluated on four major benchmarks—CCPG, CAISA-B*, SUSTech1K, and CCGR_MINI—the approach significantly outperforms prior LVM-based methods, demonstrating superior cross-domain robustness and generalization capability. It establishes a new paradigm for fine-grained behavioral recognition driven by LVMs, shifting focus from task-specific priors to principled utilization of inherent model semantics.
📝 Abstract
Large vision models (LVM) based gait recognition has achieved impressive performance. However, existing LVM-based approaches may overemphasize gait priors while neglecting the intrinsic value of LVM itself, particularly the rich, distinct representations across its multi-layers. To adequately unlock LVM's potential, this work investigates the impact of layer-wise representations on downstream recognition tasks. Our analysis reveals that LVM's intermediate layers offer complementary properties across tasks, integrating them yields an impressive improvement even without rich well-designed gait priors. Building on this insight, we propose a simple and universal baseline for LVM-based gait recognition, termed BiggerGait. Comprehensive evaluations on CCPG, CAISA-B*, SUSTech1K, and CCGR_MINI validate the superiority of BiggerGait across both within- and cross-domain tasks, establishing it as a simple yet practical baseline for gait representation learning. All the models and code will be publicly available.