🤖 AI Summary
Current AI-generated videos lack systematic evaluation regarding the realism and biomechanical plausibility of human motion. To address this gap, this work proposes HumanScore, a novel multidimensional evaluation framework specifically designed for human actions, introducing six interpretable metrics that assess kinematic plausibility, temporal stability, and biomechanical consistency, among other dimensions. Leveraging a carefully curated set of motion prompts and integrating kinematic modeling, temporal analysis, and biomechanical constraints, the framework quantitatively evaluates 13 state-of-the-art video generation models. The results not only yield a robust ranking of models based on motion quality but also uncover a significant discrepancy between visual fidelity and physical plausibility, while identifying common failure modes such as temporal jittering and anatomically implausible poses.
📝 Abstract
Recent advances in model architectures, compute, and data scale have driven rapid progress in video generation, producing increasingly realistic content. Yet, no prior method systematically measures how faithfully these systems render human bodies and motion dynamics. In this paper, we present HumanScore, a systematic framework to evaluate the quality of human motions in AI-generated videos. HumanScore defines six interpretable metrics spanning kinematic plausibility, temporal stability, and biomechanical consistency, enabling fine-grained diagnosis beyond visual realism alone. Through carefully designed prompts, we elicit a diverse set of movements at varying intensities and evaluate videos generated by thirteen state-of-the-art models. Our analysis reveals consistent gaps between perceptual plausibility and motion biomechanical fidelity, identifies recurrent failure modes (e.g., temporal jitter, anatomically implausible poses, and motion drift), and produces robust model rankings from quantitative and physically meaningful criteria.