Scaling Large Motion Models with Million-Level Human Motions

📅 2024-10-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Human motion understanding is hindered by the scarcity of high-quality action data, impeding the development of truly general-purpose large models. Method: We introduce MotionLib—the first million-scale, high-fidelity human motion dataset, 15× larger than existing benchmarks—and train Being-M0, a general-purpose large motion model. We propose MotionBook, a novel motion encoding paradigm integrating lossless, compact feature representation with a lookup-free 2D motion tokenizer, significantly enhancing representational capacity and fine-grained motion preservation. Our approach combines large-scale self-supervised pretraining, hierarchical textual annotation, and lossless feature compression, and empirically validates data–model co-scaling laws. Results: Being-M0 achieves robust performance across diverse motion generation tasks—including zero-shot generalization to unseen action categories—establishing a new state-of-the-art baseline for motion understanding and generation.

Technology Category

Application Category

📝 Abstract
Inspired by the recent success of LLMs, the field of human motion understanding has increasingly shifted toward developing large motion models. Despite some progress, current efforts remain far from achieving truly generalist models, primarily due to the lack of massive high-quality data. To address this gap, we present MotionLib, the first million-level dataset for motion generation, which is at least 15$ imes$ larger than existing counterparts and enriched with hierarchical text descriptions. Using MotionLib, we train a large motion model named Being-M0, demonstrating robust performance across a wide range of human activities, including unseen ones. Through systematic investigation, for the first time, we highlight the importance of scaling both data and model size for advancing motion generation, along with key insights to achieve this goal. To better integrate the motion modality, we propose Motionbook, an innovative motion encoding approach including (1) a compact yet lossless feature to represent motions; (2) a novel 2D lookup-free motion tokenizer that preserves fine-grained motion details while expanding codebook capacity, significantly enhancing the representational power of motion tokens. We believe this work lays the groundwork for developing more versatile and powerful motion generation models in the future. For further details, visit https://github.com/BeingBeyond/Being-M0.
Problem

Research questions and friction points this paper is trying to address.

Lack of massive high-quality data for large motion models
Need for scaling data and model size in motion generation
Improving motion encoding for better representational power
Innovation

Methods, ideas, or system contributions that make the work stand out.

Million-level dataset MotionLib for motion generation
Large motion model Being-M0 trained on MotionLib
Motionbook encoding with compact feature and tokenizer
🔎 Similar Papers
No similar papers found.
Y
Ye Wang
Renmin University
Sipeng Zheng
Sipeng Zheng
BeingBeyond
Computer VisionLarge Multimodal ModelEmbodied AI
B
Bin Cao
Beijing Academy of Artificial Intelligence, Institute of Automation, Chinese Academy of Sciences
Q
Qianshan Wei
Southeast University
Qin Jin
Qin Jin
中国人民大学信息学院
人工智能
Zongqing Lu
Zongqing Lu
Peking University | BeingBeyond
Reinforcement learning