🤖 AI Summary
This work addresses the limitations of existing video-based clinical gait analysis models, which are highly susceptible to environmental biases, exhibit poor generalization, and struggle to accurately characterize pathological movement patterns. To overcome these challenges, we propose the first trimodal framework that integrates visual, linguistic, and biomechanical modalities. Our approach innovatively aligns biomechanical information with the language space through semantic tokens and combines temporal evidence distillation with 3D skeletal sequence modeling to enable explicit reasoning about joint mechanics, thereby avoiding reliance on visual shortcuts. Evaluated on a unified dataset encompassing eight gait categories, our method achieves state-of-the-art performance. Subject-disjoint evaluations and expert blind reviews demonstrate significant improvements in clinical interpretability, evidential reliability, and privacy preservation.
📝 Abstract
Video-based Clinical Gait Analysis often suffers from poor generalization as models overfit environmental biases instead of capturing pathological motion. To address this, we propose BioGait-VLM, a tri-modal Vision-Language-Biomechanics framework for interpretable clinical gait assessment. Unlike standard video encoders, our architecture incorporates a Temporal Evidence Distillation branch to capture rhythmic dynamics and a Biomechanical Tokenization branch that projects 3D skeleton sequences into language-aligned semantic tokens. This enables the model to explicitly reason about joint mechanics independent of visual shortcuts. To ensure rigorous benchmarking, we augment the public GAVD dataset with a high-fidelity Degenerative Cervical Myelopathy (DCM) cohort to form a unified 8-class taxonomy, establishing a strict subject-disjoint protocol to prevent data leakage. Under this setting, BioGait-VLM achieves state-of-the-art recognition accuracy. Furthermore, a blinded expert study confirms that biomechanical tokens significantly improve clinical plausibility and evidence grounding, offering a path toward transparent, privacy-enhanced gait assessment.