🤖 AI Summary
This study addresses the challenge of markerless, multi-view motion capture for spinal cord injury patients in home environments without camera calibration or hardware synchronization. To this end, the authors propose an agent-based multi-view video processing framework that, for the first time, integrates multimodal large language models with an agent mechanism into uncalibrated multi-view motion analysis. The approach enables self-synchronization of videos, consistent cross-view target tracking, and self-validation, while combining monocular 2D pose estimation with uncalibrated geometric optimization to robustly extract joint angles. Experimental results demonstrate an average absolute error of 5.97° ± 2.36° in joint angle estimation compared to a Vicon system, with a Pearson correlation coefficient of 0.962 ± 0.014, significantly reducing reliance on traditional calibration and synchronization hardware.
📝 Abstract
Kinematic monitoring plays a critical role in long-term rehabilitation for patients with spinal cord injury (SCI), where multi-view markerless motion capture methods have shown significant potential. However, owing to the reliance on calibration and the difficulty of achieving multi-view synchronization, their deployment in patient self-deployed environments remains challenging. In this work, we propose an agentic pipeline for self-synchronized multi-view joint angle monitoring in uncalibrated environments using two cameras without hardware triggers. The Multimodal large language models enable automatic video synchronization and agent-driven self-verification. State-of-the-art monocular 2D pose estimation models are employed to extract candidate poses, where an agent-based selection mechanism is then applied to automatically identify and track the target subject, thereby producing consistent 2D poses in the presence of multiple individuals and occlusions. Such 2D poses are optimized to estimate joint angles from uncalibrated multi-view pose sequences, ensuring interpretability through explicit geometric modeling. Validation against Vicon system demonstrated the strong performance, achieving an MAE of $5.97^\circ \pm 2.36^\circ$ and a Pearson correlation coefficient of $0.962 \pm 0.014$. The proposed method is expected to provide a practical, patient self-deployable system to perform daily kinematic monitoring in uncalibrated home environments.