LangWBC: Language-directed Humanoid Whole-Body Control via End-to-end Learning

📅 2025-04-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the semantic–physical gap between natural language instructions and full-body humanoid robot motions, enabling end-to-end language-driven whole-body control. We propose the first unified neural architecture that jointly performs language understanding and real-world full-body motion generation. The method integrates reinforcement learning, policy distillation, and a conditional variational autoencoder (CVAE): the CVAE explicitly models action priors to significantly improve motion diversity, compositional generalization, and robustness to semantic variations in instructions; policy distillation ensures efficient sim-to-real transfer. Evaluated in simulation and on physical humanoid platforms (e.g., Unitree H1), our approach supports parsing multi-step instructions, seamless cross-skill transitions, and agile continuous locomotion. It establishes a new paradigm for intuitive and robust human–robot interaction through natural language.

Technology Category

Application Category

📝 Abstract
General-purpose humanoid robots are expected to interact intuitively with humans, enabling seamless integration into daily life. Natural language provides the most accessible medium for this purpose. However, translating language into humanoid whole-body motion remains a significant challenge, primarily due to the gap between linguistic understanding and physical actions. In this work, we present an end-to-end, language-directed policy for real-world humanoid whole-body control. Our approach combines reinforcement learning with policy distillation, allowing a single neural network to interpret language commands and execute corresponding physical actions directly. To enhance motion diversity and compositionality, we incorporate a Conditional Variational Autoencoder (CVAE) structure. The resulting policy achieves agile and versatile whole-body behaviors conditioned on language inputs, with smooth transitions between various motions, enabling adaptation to linguistic variations and the emergence of novel motions. We validate the efficacy and generalizability of our method through extensive simulations and real-world experiments, demonstrating robust whole-body control. Please see our website at LangWBC.github.io for more information.
Problem

Research questions and friction points this paper is trying to address.

Translate language into humanoid whole-body motion
Bridge linguistic understanding and physical actions gap
Achieve agile, versatile behaviors from language commands
Innovation

Methods, ideas, or system contributions that make the work stand out.

End-to-end language-directed policy for humanoid control
Combines reinforcement learning with policy distillation
Incorporates CVAE for motion diversity and compositionality
🔎 Similar Papers
No similar papers found.
Y
Yiyang Shao
University of California, Berkeley
X
Xiaoyu Huang
University of California, Berkeley
Bike Zhang
Bike Zhang
UC Berkeley
Qiayuan Liao
Qiayuan Liao
University of California, Berkeley
Legged Robots
Yuman Gao
Yuman Gao
PhD Student, Zhejiang University
RoboticsMulti-robot SystemMotion PlanningReinforcement Learning
Yufeng Chi
Yufeng Chi
University of California, Berkeley
RoboticsComputer ArchitectureReinforcement Learning
Z
Zhongyu Li
University of California, Berkeley
S
Sophia Shao
University of California, Berkeley
K
K. Sreenath
University of California, Berkeley