From 3D Pose to Prose: Biomechanics-Grounded Vision--Language Coaching

📅 2026-03-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of generating precise, actionable, and biomechanically sound personalized fitness instructions from video streams. The authors propose BioCoach, a framework that integrates visual appearance and 3D skeletal kinematics through a three-stage pipeline: an action-specific degree-of-freedom selector, a structured biomechanical context incorporating individual morphology and periodic motion constraints, and a vision–biomechanics-conditioned feedback generation mechanism. Employing a parameter-efficient fine-tuning strategy, the model freezes the vision and language backbones while leveraging cross-attention for transparent multimodal fusion. Evaluated on the QEVD-bio-fit-coach dataset, BioCoach significantly improves textual quality and biomechanical plausibility while preserving accurate temporal triggering; it also enhances instructional correctness and temporal consistency on the original QEVD-fit-coach benchmark.
📝 Abstract
We present BioCoach, a biomechanics-grounded vision--language framework for fitness coaching from streaming video. BioCoach fuses visual appearance and 3D skeletal kinematics, through a novel three-stage pipeline: an exercise-specific degree-of-freedom selector that focuses analysis on salient joints; a structured biomechanical context that pairs individualized morphometrics with cycle and constraint analysis; and a vision--biomechanics conditioned feedback module that applies cross-attention to generate precise, actionable text. Using parameter-efficient training that freezes the vision and language backbones, BioCoach yields transparent, personalized reasoning rather than pattern matching. To enable learning and fair evaluation, we augment QEVD-fit-coach with biomechanics-oriented feedback to create QEVD-bio-fit-coach, and we introduce a biomechanics-aware LLM judge metric. BioCoach delivers clear gains on QEVD-bio-fit-coach across lexical and judgment metrics while maintaining temporal triggering; on the original QEVD-fit-coach, it improves text quality and correctness with near-parity timing, demonstrating that explicit kinematics and constraints are key to accurate, phase-aware coaching.
Problem

Research questions and friction points this paper is trying to address.

biomechanics
fitness coaching
3D pose estimation
vision-language model
personalized feedback
Innovation

Methods, ideas, or system contributions that make the work stand out.

biomechanics-grounded
vision-language coaching
3D skeletal kinematics
parameter-efficient training
cross-attention feedback
🔎 Similar Papers
No similar papers found.