Motion-R1: Chain-of-Thought Reasoning and Reinforcement Learning for Human Motion Generation

📅 2025-06-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing text-to-action generation methods rely on end-to-end mapping, resulting in shallow semantic understanding, weak logical reasoning, poor action controllability, insufficient long-horizon consistency, and limited motion diversity. To address these limitations, we propose the first unified motion-language modeling framework that integrates Chain-of-Thought (CoT) reasoning with reinforcement learning. Our approach explicitly decomposes natural language instructions into structured action paths and introduces Group Relative Policy Optimization (GRPO), a novel RL algorithm that jointly optimizes CoT-based reasoning chain generation and motion synthesis. Leveraging large language models, the method performs multi-step semantic disentanglement and action path planning. Evaluated on multiple benchmarks, it achieves state-of-the-art performance, significantly improving long-horizon coherence, instruction fidelity, and motion diversity. All code, models, and data are publicly released.

Technology Category

Application Category

📝 Abstract
Recent advances in large language models, especially in natural language understanding and reasoning, have opened new possibilities for text-to-motion generation. Although existing approaches have made notable progress in semantic alignment and motion synthesis, they often rely on end-to-end mapping strategies that fail to capture deep linguistic structures and logical reasoning. Consequently, generated motions tend to lack controllability, consistency, and diversity. To address these limitations, we propose Motion-R1, a unified motion-language modeling framework that integrates a Chain-of-Thought mechanism. By explicitly decomposing complex textual instructions into logically structured action paths, Motion-R1 provides high-level semantic guidance for motion generation, significantly enhancing the model's ability to interpret and execute multi-step, long-horizon, and compositionally rich commands. To train our model, we adopt Group Relative Policy Optimization, a reinforcement learning algorithm designed for large models, which leverages motion quality feedback to optimize reasoning chains and motion synthesis jointly. Extensive experiments across multiple benchmark datasets demonstrate that Motion-R1 achieves competitive or superior performance compared to state-of-the-art methods, particularly in scenarios requiring nuanced semantic understanding and long-term temporal coherence. The code, model and data will be publicly available.
Problem

Research questions and friction points this paper is trying to address.

Improving controllability and consistency in text-to-motion generation
Enhancing motion diversity via structured linguistic reasoning
Addressing multi-step command execution in motion synthesis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Chain-of-Thought reasoning for motion generation
Group Relative Policy Optimization algorithm
Decomposing text into structured action paths
🔎 Similar Papers
No similar papers found.
R
Runqi Ouyang
GigaAI, CASIA
Haoyun Li
Haoyun Li
Institute of Automation, Chinese Academy of Sciences
computer vision
Z
Zhenyuan Zhang
GigaAI, HKUST
X
Xiaofeng Wang
GigaAI
Z
Zheng Zhu
GigaAI
G
Guan Huang
GigaAI
X
Xingang Wang
GigaAI