MotionScript: Natural Language Descriptions for Expressive 3D Human Motions

📅 2023-12-19
🏛️ arXiv.org
📈 Citations: 5
Influential: 1
📄 PDF
🤖 AI Summary
This work addresses the lack of fine-grained, expressive, and interpretable natural language descriptions for 3D human motion. We propose MotionScript—a zero-shot, unsupervised framework that maps motion sequences to structured natural language without training data. Its core is a template-based generation mechanism integrating kinematic priors and semantic rules, enabling systematic production of expressive descriptions covering affective states, stylistic gait, and human–object/human–human interactions. Methodologically, domain-knowledge-guided structured templates collaborate with large language models to jointly optimize linguistic fidelity and motion alignment, substantially improving generalization and diversity in text-driven motion generation. Experiments demonstrate significant performance gains in out-of-distribution motion synthesis. Moreover, we introduce the first large-scale, explainable language annotation resource specifically designed for expressive motion—enabling high-fidelity animation, virtual human simulation, and robot instruction grounding.
📝 Abstract
We introduce MotionScript, a novel framework for generating highly detailed, natural language descriptions of 3D human motions. Unlike existing motion datasets that rely on broad action labels or generic captions, MotionScript provides fine-grained, structured descriptions that capture the full complexity of human movement including expressive actions (e.g., emotions, stylistic walking) and interactions beyond standard motion capture datasets. MotionScript serves as both a descriptive tool and a training resource for text-to-motion models, enabling the synthesis of highly realistic and diverse human motions from text. By augmenting motion datasets with MotionScript captions, we demonstrate significant improvements in out-of-distribution motion generation, allowing large language models (LLMs) to generate motions that extend beyond existing data. Additionally, MotionScript opens new applications in animation, virtual human simulation, and robotics, providing an interpretable bridge between intuitive descriptions and motion synthesis. To the best of our knowledge, this is the first attempt to systematically translate 3D motion into structured natural language without requiring training data.
Problem

Research questions and friction points this paper is trying to address.

Generates detailed natural language descriptions for 3D human motions.
Enhances text-to-motion models for realistic and diverse motion synthesis.
Improves out-of-distribution motion generation using structured motion captions.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates detailed natural language for 3D motions
Enhances text-to-motion models with structured descriptions
Improves out-of-distribution motion generation capabilities
🔎 Similar Papers
No similar papers found.
P
Payam Jome Yazdian
School of Computing Science, Simon Fraser University, Burnaby, BC, Canada
Eric Liu
Eric Liu
University of Toronto
SecurityCompilersFuzzing
L
Li Cheng
Dept. of Electrical and Computer Engineering, University of Alberta, Edmonton, Alberta, Canada
A
Angelica Lim
School of Computing Science, Simon Fraser University, Burnaby, BC, Canada