🤖 AI Summary
Humanoid robots often exhibit limited generalization when confronted with minor variations in dynamics, tasks, or environments. To address this challenge, this work proposes HoRD, a two-stage framework that first trains a teacher policy via history-conditioned reinforcement learning to acquire online adaptation capabilities, and then transfers this adaptability to a Transformer-based student policy through online knowledge distillation. This approach achieves, for the first time, strong zero-shot generalization of a single humanoid control policy across unseen domains, significantly outperforming existing baselines. The resulting policy demonstrates exceptional robustness and transferability under unknown perturbations and cross-domain scenarios, marking a notable advance in adaptive humanoid locomotion and control.
📝 Abstract
Humanoid robots can suffer significant performance drops under small changes in dynamics, task specifications, or environment setup. We propose HoRD, a two-stage learning framework for robust humanoid control under domain shift. First, we train a high-performance teacher policy via history-conditioned reinforcement learning, where the policy infers latent dynamics context from recent state--action trajectories to adapt online to diverse randomized dynamics. Second, we perform online distillation to transfer the teacher's robust control capabilities into a transformer-based student policy that operates on sparse root-relative 3D joint keypoint trajectories. By combining history-conditioned adaptation with online distillation, HoRD enables a single policy to adapt zero-shot to unseen domains without per-domain retraining. Extensive experiments show HoRD outperforms strong baselines in robustness and transfer, especially under unseen domains and external perturbations. Code and project page are available at https://tonywang-0517.github.io/hord/.