T2MBench: A Benchmark for Out-of-Distribution Text-to-Motion Generation

πŸ“… 2026-02-14
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the lack of systematic evaluation of text-to-motion generation models under out-of-distribution (OOD) complex textual conditions. To this end, the authors construct the first benchmark specifically designed for OOD scenarios, comprising 1,025 OOD textual descriptions, and introduce a unified multidimensional evaluation framework that integrates large language model–based assessment, multifactor motion quality metrics, and fine-grained accuracy evaluation. A comprehensive evaluation of 14 representative models using this framework reveals heterogeneous performance in semantic alignment, motion generalization, and physical plausibility, yet consistently highlights deficiencies in fine-grained accuracy. These findings provide clear guidance for future model development in text-to-motion generation.

Technology Category

Application Category

πŸ“ Abstract
Most existing evaluations of text-to-motion generation focus on in-distribution textual inputs and a limited set of evaluation criteria, which restricts their ability to systematically assess model generalization and motion generation capabilities under complex out-of-distribution (OOD) textual conditions. To address this limitation, we propose a benchmark specifically designed for OOD text-to-motion evaluation, which includes a comprehensive analysis of 14 representative baseline models and the two datasets derived from evaluation results. Specifically, we construct an OOD prompt dataset consisting of 1,025 textual descriptions. Based on this prompt dataset, we introduce a unified evaluation framework that integrates LLM-based Evaluation, Multi-factor Motion evaluation, and Fine-grained Accuracy Evaluation. Our experimental results reveal that while different baseline models demonstrate strengths in areas such as text-to-motion semantic alignment, motion generalizability, and physical quality, most models struggle to achieve strong performance with Fine-grained Accuracy Evaluation. These findings highlight the limitations of existing methods in OOD scenarios and offer practical guidance for the design and evaluation of future production-level text-to-motion models.
Problem

Research questions and friction points this paper is trying to address.

out-of-distribution
text-to-motion generation
evaluation benchmark
generalization
motion generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

out-of-distribution
text-to-motion
benchmark
fine-grained evaluation
LLM-based evaluation
πŸ”Ž Similar Papers
No similar papers found.
B
Bin Yang
The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China
R
Rong Ou
The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China
W
Weisheng Xu
The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China
J
Jiaqi Xiong
University of Oxford, Oxford, United Kingdom
Xintao Li
Xintao Li
University of Miami
Machine Learning
T
Taowen Wang
The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China
L
Luyu Zhu
The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China
Xu Jiang
Xu Jiang
Duke University
Information economicsaccounting standard settingreal effectsdisclosurefinancial institutions
Jing Tan
Jing Tan
The Chinese University of Hong Kong
Immersive Scene Generation3D-Aware Generative AIVideo Understanding
Renjing Xu
Renjing Xu
HKUST(GZ)
Brain-inspired ComputingHumanoid Computing