RoleMotion: A Large-Scale Dataset towards Robust Scene-Specific Role-Playing Motion Synthesis with Fine-grained Descriptions

📅 2025-12-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing motion datasets suffer from functional singularity, insufficient scene coverage, inconsistent quality, and coarse-grained textual annotations. To address these limitations, this work introduces the first high-quality, scenario-oriented human motion dataset tailored for role-playing applications. It encompasses 25 canonical scenarios, 110 functional roles, and over 500 distinct behaviors, comprising 10,296 high-fidelity full-body + hand motion sequences and 27,831 fine-grained natural language descriptions. We propose a novel scenario- and role-driven functional motion annotation framework and design a dedicated evaluator to assess text-to-motion generation performance. Experimental results demonstrate that our dataset significantly enhances generated motions’ scene adaptability, semantic consistency, and cross-modal alignment—thereby providing robust support for text-driven full-body motion synthesis.

Technology Category

Application Category

📝 Abstract
In this paper, we introduce RoleMotion, a large-scale human motion dataset that encompasses a wealth of role-playing and functional motion data tailored to fit various specific scenes. Existing text datasets are mainly constructed decentrally as amalgamation of assorted subsets that their data are nonfunctional and isolated to work together to cover social activities in various scenes. Also, the quality of motion data is inconsistent, and textual annotation lacks fine-grained details in these datasets. In contrast, RoleMotion is meticulously designed and collected with a particular focus on scenes and roles. The dataset features 25 classic scenes, 110 functional roles, over 500 behaviors, and 10296 high-quality human motion sequences of body and hands, annotated with 27831 fine-grained text descriptions. We build an evaluator stronger than existing counterparts, prove its reliability, and evaluate various text-to-motion methods on our dataset. Finally, we explore the interplay of motion generation of body and hands. Experimental results demonstrate the high-quality and functionality of our dataset on text-driven whole-body generation.
Problem

Research questions and friction points this paper is trying to address.

Addresses lack of functional motion data for diverse scenes
Improves inconsistent motion quality and coarse text annotations
Enables robust scene-specific role-playing motion synthesis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale dataset for role-playing motion synthesis
Fine-grained text descriptions for high-quality motion sequences
Evaluator for text-to-motion methods and whole-body generation
🔎 Similar Papers
No similar papers found.
Junran Peng
Junran Peng
Assosiate Professor of USTB
3D AIGC3D Comprehension and ReconstructionEmbodied AI
Yiheng Huang
Yiheng Huang
Fudan University
Software Supply Chain
S
Silei Shen
University of Science and Technology Beijing
Z
Zeji Wei
University of Science and Technology Beijing
J
Jingwei Yang
China University of Mining And Technology
B
Baojie Wang
University of Science and Technology Beijing
Y
Yonghao He
D-Robotics
Chuanchen Luo
Chuanchen Luo
Shandong University
3D VisionGenerative AISpatial IntelligenceHuman-Centric Perception
M
Man Zhang
Beijing University of Posts and Telecommunications
X
Xucheng Yin
University of Science and Technology Beijing
Wei Sui
Wei Sui
Horizon Robotics
3D VisionBev Perception3D Reconstruction