BioGait-VLM: A Tri-Modal Vision-Language-Biomechanics Framework for Interpretable Clinical Gait Assessment

📅 2026-03-09

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

This work addresses the limitations of existing video-based clinical gait analysis models, which are highly susceptible to environmental biases, exhibit poor generalization, and struggle to accurately characterize pathological movement patterns. To overcome these challenges, we propose the first trimodal framework that integrates visual, linguistic, and biomechanical modalities. Our approach innovatively aligns biomechanical information with the language space through semantic tokens and combines temporal evidence distillation with 3D skeletal sequence modeling to enable explicit reasoning about joint mechanics, thereby avoiding reliance on visual shortcuts. Evaluated on a unified dataset encompassing eight gait categories, our method achieves state-of-the-art performance. Subject-disjoint evaluations and expert blind reviews demonstrate significant improvements in clinical interpretability, evidential reliability, and privacy preservation.

Technology Category

Application Category

📝 Abstract

Video-based Clinical Gait Analysis often suffers from poor generalization as models overfit environmental biases instead of capturing pathological motion. To address this, we propose BioGait-VLM, a tri-modal Vision-Language-Biomechanics framework for interpretable clinical gait assessment. Unlike standard video encoders, our architecture incorporates a Temporal Evidence Distillation branch to capture rhythmic dynamics and a Biomechanical Tokenization branch that projects 3D skeleton sequences into language-aligned semantic tokens. This enables the model to explicitly reason about joint mechanics independent of visual shortcuts. To ensure rigorous benchmarking, we augment the public GAVD dataset with a high-fidelity Degenerative Cervical Myelopathy (DCM) cohort to form a unified 8-class taxonomy, establishing a strict subject-disjoint protocol to prevent data leakage. Under this setting, BioGait-VLM achieves state-of-the-art recognition accuracy. Furthermore, a blinded expert study confirms that biomechanical tokens significantly improve clinical plausibility and evidence grounding, offering a path toward transparent, privacy-enhanced gait assessment.

Problem

Research questions and friction points this paper is trying to address.

Clinical Gait Analysis

Generalization

Interpretability

Pathological Motion

Environmental Bias

Innovation

Methods, ideas, or system contributions that make the work stand out.

Tri-modal framework

Biomechanical tokenization

Temporal evidence distillation