🤖 AI Summary
This work addresses the challenge of effectively leveraging human motion data for robot motor control. We propose a zero-shot transfer framework that trains a predictive model solely on human keypoint trajectories and directly employs it for end-effector trajectory control—without gradient reparameterization, adversarial training, handcrafted dense rewards, or curriculum learning. Our method integrates sensor-state history encoding, sparse-reward reinforcement learning, and keypoint trajectory tracking to enable end-to-end optimization of perception–action policies. By circumventing the low utilization of large-scale interaction data and heavy reliance on reward engineering inherent in prior approaches, the framework achieves significant performance gains over baselines across diverse robotic platforms and tasks—including manipulation and locomotion. Experimental results demonstrate its generality, effectiveness, and engineering feasibility.
📝 Abstract
As the embodiment gap between a robot and a human narrows, new opportunities arise to leverage datasets of humans interacting with their surroundings for robot learning. We propose a novel technique for training sensorimotor policies with reinforcement learning by imitating predictive models of human motions. Our key insight is that the motion of keypoints on human-inspired robot end-effectors closely mirrors the motion of corresponding human body keypoints. This enables us to use a model trained to predict future motion on human data emph{zero-shot} on robot data. We train sensorimotor policies to track the predictions of such a model, conditioned on a history of past robot states, while optimizing a relatively sparse task reward. This approach entirely bypasses gradient-based kinematic retargeting and adversarial losses, which limit existing methods from fully leveraging the scale and diversity of modern human-scene interaction datasets. Empirically, we find that our approach can work across robots and tasks, outperforming existing baselines by a large margin. In addition, we find that tracking a human motion model can substitute for carefully designed dense rewards and curricula in manipulation tasks. Code, data and qualitative results available at https://jirl-upenn.github.io/track_reward/.