HY-Motion 1.0: Scaling Flow Matching Models for Text-To-Motion Generation

📅 2025-12-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses three key challenges in text-to-3D human motion generation: low motion fidelity, weak instruction alignment, and narrow action category coverage. We propose the first billion-parameter DiT-based flow matching model and introduce a novel three-stage training paradigm—pretraining, fine-tuning, and Reinforcement Learning from Human Feedback (RLHF). Pretraining leverages large-scale, multi-source cleaned motion data with precise caption annotations; RLHF employs a learnable reward model jointly optimized with human feedback to significantly enhance instruction following. Our framework achieves the broadest open-source coverage to date—spanning six high-level action categories and over 200 fine-grained motion classes—and outperforms all prior state-of-the-art methods on major benchmarks. To foster reproducibility and industrial adoption, we fully open-source our code, models, and datasets.

Technology Category

Application Category

📝 Abstract
We present HY-Motion 1.0, a series of state-of-the-art, large-scale, motion generation models capable of generating 3D human motions from textual descriptions. HY-Motion 1.0 represents the first successful attempt to scale up Diffusion Transformer (DiT)-based flow matching models to the billion-parameter scale within the motion generation domain, delivering instruction-following capabilities that significantly outperform current open-source benchmarks. Uniquely, we introduce a comprehensive, full-stage training paradigm -- including large-scale pretraining on over 3,000 hours of motion data, high-quality fine-tuning on 400 hours of curated data, and reinforcement learning from both human feedback and reward models -- to ensure precise alignment with the text instruction and high motion quality. This framework is supported by our meticulous data processing pipeline, which performs rigorous motion cleaning and captioning. Consequently, our model achieves the most extensive coverage, spanning over 200 motion categories across 6 major classes. We release HY-Motion 1.0 to the open-source community to foster future research and accelerate the transition of 3D human motion generation models towards commercial maturity.
Problem

Research questions and friction points this paper is trying to address.

Generates 3D human motions from textual descriptions
Scales flow matching models to billion parameters for motion
Ensures precise text-motion alignment via full-stage training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Scaling billion-parameter Diffusion Transformer flow matching models
Full-stage training with pretraining, fine-tuning, and reinforcement learning
Meticulous data processing for extensive motion category coverage
🔎 Similar Papers
No similar papers found.
Y
Yuxin Wen
Tencent Hunyuan 3D Digital Human Team
Qing Shuai
Qing Shuai
Tencent
computer vision
D
Di Kang
Tencent Hunyuan 3D Digital Human Team
J
Jing Li
Tencent Hunyuan 3D Digital Human Team
Cheng Wen
Cheng Wen
Guangzhou Institute of Technology, Xidian University
Software EngineeringSoftware TestingCyber SecurityAI for SE
Yue Qian
Yue Qian
University of British Columbia
Family DemographyGenderLife CourseHealthChinese Society
N
Ningxin Jiao
Tencent Hunyuan 3D Digital Human Team
C
Changhai Chen
Tencent Hunyuan 3D Digital Human Team
W
Weijie Chen
Tencent Hunyuan 3D Digital Human Team
Y
Yiran Wang
Tencent Hunyuan 3D Digital Human Team
J
Jinkun Guo
Tencent Hunyuan 3D Digital Human Team
D
Dongyue An
Tencent Hunyuan 3D Digital Human Team
H
Han Liu
Tencent Hunyuan 3D Digital Human Team
Y
Yanyu Tong
Tencent Hunyuan 3D Digital Human Team
C
Chao Zhang
Tencent Hunyuan 3D Digital Human Team
Q
Qing Guo
Tencent Hunyuan 3D Digital Human Team
Juan Chen
Juan Chen
School of Psychology, South China Normal University
psychologyneuroscience
Q
Qiao Zhang
Tencent Hunyuan 3D Digital Human Team
Y
Youyi Zhang
Tencent Hunyuan 3D Digital Human Team
Zihao Yao
Zihao Yao
University of Sydney
Graph Neural NetworksDeep LearningBlockchainAnomaly Detection
C
Cheng Zhang
Tencent Hunyuan 3D Digital Human Team
H
Hong Duan
Tencent Hunyuan 3D Digital Human Team
X
Xiaoping Wu
Tencent Hunyuan 3D Digital Human Team
Q
Qi Chen
Tencent Hunyuan 3D Digital Human Team
F
Fei Cheng
Tencent Hunyuan 3D Digital Human Team