TextOp: Real-time Interactive Text-Driven Humanoid Robot Motion Generation and Control

📅 2026-02-07
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of driving general-purpose humanoid robots in real time with interactive, flexible user intent while enabling autonomous execution. To this end, the authors propose TextOp, a framework that combines a high-level autoregressive motion diffusion model to generate short-horizon full-body motion trajectories from streaming text instructions in real time, and a low-level robust tracking policy to accurately execute these motions on physical robots. TextOp is the first method to support dynamic modification of instructions during execution, enabling free-form intent expression and seamless switching among multiple behaviors. Experiments on real robots demonstrate the system’s immediate responsiveness, motion smoothness, and control precision, successfully achieving fluid transitions between complex actions such as dancing and jumping.

Technology Category

Application Category

📝 Abstract
Recent advances in humanoid whole-body motion tracking have enabled the execution of diverse and highly coordinated motions on real hardware. However, existing controllers are commonly driven either by predefined motion trajectories, which offer limited flexibility when user intent changes, or by continuous human teleoperation, which requires constant human involvement and limits autonomy. This work addresses the problem of how to drive a universal humanoid controller in a real-time and interactive manner. We present TextOp, a real-time text-driven humanoid motion generation and control framework that supports streaming language commands and on-the-fly instruction modification during execution. TextOp adopts a two-level architecture in which a high-level autoregressive motion diffusion model continuously generates short-horizon kinematic trajectories conditioned on the current text input, while a low-level motion tracking policy executes these trajectories on a physical humanoid robot. By bridging interactive motion generation with robust whole-body control, TextOp unlocks free-form intent expression and enables smooth transitions across multiple challenging behaviors such as dancing and jumping, within a single continuous motion execution. Extensive real-robot experiments and offline evaluations demonstrate instant responsiveness, smooth whole-body motion, and precise control. The project page and the open-source code are available at https://text-op.github.io/
Problem

Research questions and friction points this paper is trying to address.

humanoid robot
real-time control
interactive motion generation
text-driven control
whole-body motion
Innovation

Methods, ideas, or system contributions that make the work stand out.

text-driven control
real-time motion generation
humanoid robot
interactive motion control
motion diffusion model
🔎 Similar Papers
No similar papers found.