BioGait-VLM: A Tri-Modal Vision-Language-Biomechanics Framework for Interpretable Clinical Gait Assessment

📅 2026-03-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing video-based clinical gait analysis models, which are highly susceptible to environmental biases, exhibit poor generalization, and struggle to accurately characterize pathological movement patterns. To overcome these challenges, we propose the first trimodal framework that integrates visual, linguistic, and biomechanical modalities. Our approach innovatively aligns biomechanical information with the language space through semantic tokens and combines temporal evidence distillation with 3D skeletal sequence modeling to enable explicit reasoning about joint mechanics, thereby avoiding reliance on visual shortcuts. Evaluated on a unified dataset encompassing eight gait categories, our method achieves state-of-the-art performance. Subject-disjoint evaluations and expert blind reviews demonstrate significant improvements in clinical interpretability, evidential reliability, and privacy preservation.

Technology Category

Application Category

📝 Abstract
Video-based Clinical Gait Analysis often suffers from poor generalization as models overfit environmental biases instead of capturing pathological motion. To address this, we propose BioGait-VLM, a tri-modal Vision-Language-Biomechanics framework for interpretable clinical gait assessment. Unlike standard video encoders, our architecture incorporates a Temporal Evidence Distillation branch to capture rhythmic dynamics and a Biomechanical Tokenization branch that projects 3D skeleton sequences into language-aligned semantic tokens. This enables the model to explicitly reason about joint mechanics independent of visual shortcuts. To ensure rigorous benchmarking, we augment the public GAVD dataset with a high-fidelity Degenerative Cervical Myelopathy (DCM) cohort to form a unified 8-class taxonomy, establishing a strict subject-disjoint protocol to prevent data leakage. Under this setting, BioGait-VLM achieves state-of-the-art recognition accuracy. Furthermore, a blinded expert study confirms that biomechanical tokens significantly improve clinical plausibility and evidence grounding, offering a path toward transparent, privacy-enhanced gait assessment.
Problem

Research questions and friction points this paper is trying to address.

Clinical Gait Analysis
Generalization
Interpretability
Pathological Motion
Environmental Bias
Innovation

Methods, ideas, or system contributions that make the work stand out.

Tri-modal framework
Biomechanical tokenization
Temporal evidence distillation
Interpretable gait assessment
Subject-disjoint benchmarking
🔎 Similar Papers
No similar papers found.
E
Erdong Chen
Department of Computer Science, Drexel University
Yuyang Ji
Yuyang Ji
Drexel
Computer visionVision Large Language Model
J
Jacob K. Greenberg
Department of Neurological Surgery, Washington University
B
Benjamin Steel
University of California, Berkeley
F
Faraz Arkam
Department of Neurological Surgery, Washington University
A
Abigail Lewis
Department of Neurological Surgery, Washington University
P
Pranay Singh
Department of Neurological Surgery, Washington University
Feng Liu
Feng Liu
Assistant Professor, Drexel University
Computer VisionPattern RecognitionMachine LearningBiometrics