OpenT2M: No-frill Motion Generation with Open-source,Large-scale, High-quality Data

📅 2026-03-19

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

This work addresses the limited generalization of existing text-to-motion generation models, which stems from small-scale and low-diversity datasets. To overcome this, the authors introduce OpenT2M, a large-scale open-source dataset comprising over 2,800 hours of high-quality human motion, and propose MonoFrill, a lightweight pre-trained model. Central to their approach is a biologically inspired 2D-PRQ motion tokenizer based on body-part segmentation, coupled with an automated pipeline for synthesizing long motion sequences. The framework further incorporates multi-granularity filtering and physics-based feasibility validation to enhance zero-shot generalization. Experimental results demonstrate that OpenT2M significantly advances text-to-motion modeling, with the 2D-PRQ tokenizer achieving superior performance in both motion reconstruction accuracy and zero-shot motion generation.

Technology Category

Application Category

📝 Abstract

Text-to-motion (T2M) generation aims to create realistic human movements from text descriptions, with promising applications in animation and robotics. Despite recent progress, current T2M models perform poorly on unseen text descriptions due to the small scale and limited diversity of existing motion datasets. To address this problem, we introduce OpenT2M, a million-level, high-quality, and open-source motion dataset containing over 2800 hours of human motion. Each sequence undergoes rigorous quality control through physical feasibility validation and multi-granularity filtering, with detailed second-wise text annotations. We also develop an automated pipeline for creating long-horizon sequences, enabling complex motion generation. Building upon OpenT2M, we introduce MonoFrill, a pretrained motion model that achieves compelling T2M results without complicated designs or technique tricks as "frills". Its core component is 2D-PRQ, a novel motion tokenizer that captures spatiotemporal dependencies by dividing the human body into biology parts. Experiments show that OpenT2M significantly improves generalization of existing T2M models, while 2D-PRQ achieves superior reconstruction and strong zero-shot performance. We expect OpenT2M and MonoFrill will advance the T2M field by addressing longstanding data quality and benchmarking challenges.

Problem

Research questions and friction points this paper is trying to address.

Text-to-motion

motion dataset

generalization

data diversity

zero-shot performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

OpenT2M

2D-PRQ

text-to-motion