CLAW: Composable Language-Annotated Whole-body Motion Generation

πŸ“… 2026-04-13
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

196K/year
πŸ€– AI Summary
Existing whole-body control of humanoid robots suffers from a lack of large-scale, diverse, and physically plausible language-motion paired data. To address this gap, this work proposes a simulation-based interactive data generation system that integrates composable motion primitives with a low-level whole-body controller. The system features a templated, multi-style language annotation engine supporting dual-mode editing via keyboard input and timeline manipulation. Implemented in MuJoCo, it generates physically realistic, semantically rich full-body motion trajectories paired with natural language descriptions at 50 Hz. The project has been open-sourced, substantially expanding both the scale and diversity of data available for language-guided humanoid robot learning.

Technology Category

Application Category

πŸ“ Abstract
Training language-conditioned whole-body controllers for humanoid robots requires large-scale datasets pairing motion trajectories with natural-language descriptions.Existing approaches based on motion capture are costly and limited in diversity, while text-to-motion generative models produce purely kinematic outputs that are not guaranteed to be physically feasible.Therefore, we present CLAW, an interactive web-based pipeline for scalable generation of language-annotated whole-body motion data for the Unitree G1 humanoid robot. CLAW treats the motion modes of a kinematic planner as composable building blocks, each parameterized by movement, heading, speed, pelvis height and duration, and provides two browser-based interfaces -- a real-time keyboard mode and a timeline-based sequence editor -- for exploratory and batch data collection. A low-level whole-body controller tracks the planner's kinematic references in MuJoCo simulation, producing physically grounded trajectories recorded at 50Hz. Simultaneously, a deterministic template-based annotation engine generates diverse natural-language descriptions at multiple stylistic registers for every segment and for the full trajectory. We release the system as open source to support scalable generation of language-motion paired data for humanoid robot learning.
Problem

Research questions and friction points this paper is trying to address.

language-conditioned motion
humanoid robots
motion capture
text-to-motion generation
physically feasible motion
Innovation

Methods, ideas, or system contributions that make the work stand out.

composable motion generation
language-conditioned control
humanoid robot simulation
interactive motion authoring
physically feasible trajectories