Explicit Time-Frequency Dynamics for Skeleton-Based Gait Recognition

📅 2026-04-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing skeleton-based gait recognition methods suffer from degraded performance under appearance variations—such as carrying objects or wearing coats—due to insufficient explicit modeling of motion dynamics. To address this, this work proposes a plug-and-play Wavelet Feature Stream that, for the first time, incorporates the Continuous Wavelet Transform (CWT) into skeleton sequence modeling. The approach converts joint velocities into multi-scale time-frequency representations and employs a lightweight multi-scale CNN to extract dynamic cues, which are then fused with features from the backbone network. Notably, this method requires neither architectural modifications to the backbone nor additional supervision, yet significantly enhances robustness under covariate shift. When integrated with strong backbones such as GaitMixer, it achieves state-of-the-art performance on the CASIA-B dataset, with particularly pronounced gains in challenging scenarios like BG (Backpack) and CL (Coat).
📝 Abstract
Skeleton-based gait recognizers excel at modeling spatial configurations but often underuse explicit motion dynamics that are crucial under appearance changes. We introduce a plug-and-play Wavelet Feature Stream that augments any skeleton backbone with time-frequency dynamics of joint velocities. Concretely, per-joint velocity sequences are transformed by the continuous wavelet transform (CWT) into multi-scale scalograms, from which a lightweight multi-scale CNN learns discriminative dynamic cues. The resulting descriptor is fused with the backbone representation for classification, requiring no changes to the backbone architecture or additional supervision. Across CASIA-B, the proposed stream delivers consistent gains on strong skeleton backbones (e.g., GaitMixer, GaitFormer, GaitGraph) and establishes a new skeleton-based state of the art when attached to GaitMixer. The improvements are especially pronounced under covariate shifts such as carrying bags (BG) and wearing coats (CL), highlighting the complementarity of explicit time-frequency modeling and standard spatio-temporal encoders.
Problem

Research questions and friction points this paper is trying to address.

skeleton-based gait recognition
motion dynamics
appearance changes
time-frequency dynamics
covariate shifts
Innovation

Methods, ideas, or system contributions that make the work stand out.

Wavelet Feature Stream
Continuous Wavelet Transform
Time-Frequency Dynamics
Skeleton-Based Gait Recognition
Multi-scale CNN
🔎 Similar Papers
No similar papers found.
S
Seoyeon Ko
Ewha Womans University, Seoul, Korea
Y
Yeojin Song
Ewha Womans University, Seoul, Korea
E
Egene Chung
Ewha Womans University, Seoul, Korea
L
Luca Quagliato
University of Trento, Trento TN, Italy
Taeyong Lee
Taeyong Lee
Division of Mechanical and Biomedical Engineering, Ewha Womans University
Bone ViscoelasticityFoot & Ankle Biomechanics
Junhyug Noh
Junhyug Noh
Ewha Womans University
Computer VisionObject RecognitionWeakly Supervised LearningActive LearningMedical AI