FRoM-W1: Towards General Humanoid Whole-Body Control with Language Instructions

📅 2026-01-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing humanoid robots struggle to achieve general-purpose whole-body motion control directly from natural language instructions. This work proposes FRoM-W1, a two-stage framework that first trains a language-conditioned motion generation model, H-GPT, on large-scale human motion data, and then deploys the generated motions onto physical robots via H-ACT—a controller fine-tuned through pretrained initialization and reinforcement learning. FRoM-W1 is the first end-to-end open-source framework of its kind, incorporating Chain-of-Thought prompting to enhance language understanding and generalization, while enabling high-fidelity cross-platform motion transfer. Evaluated on the HumanML3D-X benchmark, the framework significantly improves motion tracking accuracy and task success rates for Unitree H1 and G1 humanoid robots.

Technology Category

Application Category

📝 Abstract
Humanoid robots are capable of performing various actions such as greeting, dancing and even backflipping. However, these motions are often hard-coded or specifically trained, which limits their versatility. In this work, we present FRoM-W1, an open-source framework designed to achieve general humanoid whole-body motion control using natural language. To universally understand natural language and generate corresponding motions, as well as enable various humanoid robots to stably execute these motions in the physical world under gravity, FRoM-W1 operates in two stages: (a) H-GPT: utilizing massive human data, a large-scale language-driven human whole-body motion generation model is trained to generate diverse natural behaviors. We further leverage the Chain-of-Thought technique to improve the model's generalization in instruction understanding. (b) H-ACT: After retargeting generated human whole-body motions into robot-specific actions, a motion controller that is pretrained and further fine-tuned through reinforcement learning in physical simulation enables humanoid robots to accurately and stably perform corresponding actions. It is then deployed on real robots via a modular simulation-to-reality module. We extensively evaluate FRoM-W1 on Unitree H1 and G1 robots. Results demonstrate superior performance on the HumanML3D-X benchmark for human whole-body motion generation, and our introduced reinforcement learning fine-tuning consistently improves both motion tracking accuracy and task success rates of these humanoid robots. We open-source the entire FRoM-W1 framework and hope it will advance the development of humanoid intelligence.
Problem

Research questions and friction points this paper is trying to address.

humanoid robots
whole-body control
natural language instructions
motion generation
simulation-to-reality
Innovation

Methods, ideas, or system contributions that make the work stand out.

language-driven motion generation
humanoid whole-body control
Chain-of-Thought prompting
reinforcement learning fine-tuning
sim-to-real transfer
🔎 Similar Papers
No similar papers found.
P
Peng Li
Fudan University
Z
Zihan Zhuang
Fudan University
Y
Yang Gao
Fudan University
Y
Yi Dong
Fudan University
Sixian Li
Sixian Li
Master's degree student,Fudan University
NLP
C
Changhao Jiang
Fudan University
Shihan Dou
Shihan Dou
Fudan University
LLMsCode LMsRLAlignment
Zhiheng Xi
Zhiheng Xi
Fudan University
LLM ReasoningLLM-based Agents
E
Enyu Zhou
Fudan University
J
Jixuan Huang
Fudan University
H
Hui Li
Fudan University
Jingjing Gong
Jingjing Gong
SII
Machine LearningAI for ScienceLarge Language ModelEmbodied AI
Xingjun Ma
Xingjun Ma
Fudan University
Trustworthy AIMultimodal AIGenerative AIEmbodied AI
T
Tao Gui
Fudan University
Zuxuan Wu
Zuxuan Wu
Fudan University
Qi Zhang
Qi Zhang
Fudan University
SAGINsatellite routing
X
Xuanjing Huang
Fudan University
Yu-Gang Jiang
Yu-Gang Jiang
Professor, Fudan University. IEEE & IAPR Fellow
Video AnalysisEmbodied AITrustworthy AI
X
Xipeng Qiu
Fudan University