π€ AI Summary
Existing methods for humanoid robot object manipulation under long-horizon, non-vertical, and dynamic trajectories are limited by disembodied hand control and short-trajectory assumptions. Method: We propose an end-to-end reinforcement learning controller grounded in humanoid motion representations, trained without paired action-trajectory dataβrelying solely on lightweight state observations and sparse rewards. Our approach integrates humanoid motion priors, closed-loop control, and a gridded object perception interface. Contribution/Results: The system achieves high-success-rate trajectory-following manipulation across 1,200+ heterogeneous objects. Experiments demonstrate state-of-the-art performance in trajectory tracking accuracy and zero-shot generalization to unseen objects. Moreover, it enables plug-and-play transfer to arbitrary user-specified trajectories without retraining.
π Abstract
We present a method for controlling a simulated humanoid to grasp an object and move it to follow an object's trajectory. Due to the challenges in controlling a humanoid with dexterous hands, prior methods often use a disembodied hand and only consider vertical lifts or short trajectories. This limited scope hampers their applicability for object manipulation required for animation and simulation. To close this gap, we learn a controller that can pick up a large number (>1200) of objects and carry them to follow randomly generated trajectories. Our key insight is to leverage a humanoid motion representation that provides human-like motor skills and significantly speeds up training. Using only simplistic reward, state, and object representations, our method shows favorable scalability on diverse objects and trajectories. For training, we do not need a dataset of paired full-body motion and object trajectories. At test time, we only require the object mesh and desired trajectories for grasping and transporting. To demonstrate the capabilities of our method, we show state-of-the-art success rates in following object trajectories and generalizing to unseen objects. Code and models will be released.