MotionGlot: A Multi-Embodied Motion Generation Model

📅 2024-10-22

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

204K/year

🤖 AI Summary

MotionGlot addresses the challenges of cross-modal motion generation—namely, inconsistent multi-dimensional action spaces across heterogeneous agents (e.g., quadrupeds and humans), scarcity of high-quality annotated data, and difficulty in text-motion alignment—by adapting large language model (LLM) training paradigms to motion synthesis. To this end, it introduces: (1) a multi-entity motion-space alignment mechanism; (2) a text-motion joint embedding framework with instruction fine-tuning; and (3) the first directionally annotated quadruped locomotion dataset and a large-scale, scenario-aware human motion prompting corpus. Evaluated on six generation tasks, MotionGlot achieves an average 35.3% improvement over prior methods and demonstrates end-to-end deployment on a physical quadruped robot. This work establishes a novel paradigm for text-driven, general-purpose motion generation across diverse embodied agents.

Technology Category

Application Category

📝 Abstract

This paper introduces MotionGlot, a model that can generate motion across multiple embodiments with different action dimensions, such as quadruped robots and human bodies. By leveraging the well-established training procedures commonly used in large language models (LLMs), we introduce an instruction-tuning template specifically designed for motionrelated tasks. Our approach demonstrates that the principles underlying LLM training can be successfully adapted to learn a wide range of motion generation tasks across multiple embodiments with different action dimensions. We demonstrate the various abilities of MotionGlot on a set of 6 tasks and report an average improvement of 35.3% across tasks. Additionally, we contribute two new datasets: (1) a dataset of expert-controlled quadruped locomotion with approximately 48,000 trajectories paired with direction-based text annotations, and (2) a dataset of over 23,000 situational text prompts for human motion generation tasks. Finally, we conduct hardware experiments to validate the capabilities of our system in real-world applications.

Problem

Research questions and friction points this paper is trying to address.

Generating motion across multiple embodiments with different action dimensions

Adapting LLM training principles for diverse motion generation tasks

Validating system capabilities in real-world hardware applications

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages LLM training for motion generation

Introduces instruction-tuning for motion tasks

Validates with hardware experiments

🔎 Similar Papers

No similar papers found.