VMBench: A Benchmark for Perception-Aligned Video Motion Generation

📅 2025-03-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current video motion evaluation faces two key bottlenecks: misalignment between metrics and human perception, and limited diversity in motion prompts. To address these, we introduce VMBench—the first human-perception-aligned benchmark for video motion generation. It proposes a novel five-dimensional fine-grained motion evaluation framework grounded in human motion perception modeling. We further design a metadata-guided large language model prompting pipeline coupled with human-in-the-loop refinement, enabling construction of a high-diversity motion prompt library covering six dynamic scenarios. Additionally, VMBench incorporates a human preference annotation mechanism; Spearman correlation analysis demonstrates a 35.3% average improvement in alignment between automated scores and human judgments. VMBench will be open-sourced, establishing a reproducible, extensible paradigm for assessing motion quality in video generation.

Technology Category

Application Category

📝 Abstract
Video generation has advanced rapidly, improving evaluation methods, yet assessing video's motion remains a major challenge. Specifically, there are two key issues: 1) current motion metrics do not fully align with human perceptions; 2) the existing motion prompts are limited. Based on these findings, we introduce VMBench--a comprehensive Video Motion Benchmark that has perception-aligned motion metrics and features the most diverse types of motion. VMBench has several appealing properties: 1) Perception-Driven Motion Evaluation Metrics, we identify five dimensions based on human perception in motion video assessment and develop fine-grained evaluation metrics, providing deeper insights into models' strengths and weaknesses in motion quality. 2) Meta-Guided Motion Prompt Generation, a structured method that extracts meta-information, generates diverse motion prompts with LLMs, and refines them through human-AI validation, resulting in a multi-level prompt library covering six key dynamic scene dimensions. 3) Human-Aligned Validation Mechanism, we provide human preference annotations to validate our benchmarks, with our metrics achieving an average 35.3% improvement in Spearman's correlation over baseline methods. This is the first time that the quality of motion in videos has been evaluated from the perspective of human perception alignment. Additionally, we will soon release VMBench at https://github.com/GD-AIGC/VMBench, setting a new standard for evaluating and advancing motion generation models.
Problem

Research questions and friction points this paper is trying to address.

Develops perception-aligned metrics for video motion evaluation.
Creates diverse motion prompts using meta-guided generation methods.
Introduces human-aligned validation to improve motion quality assessment.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Perception-driven motion evaluation metrics
Meta-guided motion prompt generation
Human-aligned validation mechanism
🔎 Similar Papers
No similar papers found.
X
Xinrang Ling
APMP, Alibaba Group
C
Chen Zhu
APMP, Alibaba Group
Meiqi Wu
Meiqi Wu
the University of Chinese Academy of Sciences
Computer vision
H
Hangyu Li
APMP, Alibaba Group
Xiaokun Feng
Xiaokun Feng
Institute of Automation,Chinese Academy of Sciences
computer versiondeep learning
C
Cundian Yang
APMP, Alibaba Group
Aiming Hao
Aiming Hao
AMAP, Alibaba Group
video generationvideo comprehension
J
Jiashu Zhu
APMP, Alibaba Group
J
Jiahong Wu
APMP, Alibaba Group
X
Xiangxiang Chu
APMP, Alibaba Group