HoRD: Robust Humanoid Control via History-Conditioned Reinforcement Learning and Online Distillation

📅 2026-02-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Humanoid robots often exhibit limited generalization when confronted with minor variations in dynamics, tasks, or environments. To address this challenge, this work proposes HoRD, a two-stage framework that first trains a teacher policy via history-conditioned reinforcement learning to acquire online adaptation capabilities, and then transfers this adaptability to a Transformer-based student policy through online knowledge distillation. This approach achieves, for the first time, strong zero-shot generalization of a single humanoid control policy across unseen domains, significantly outperforming existing baselines. The resulting policy demonstrates exceptional robustness and transferability under unknown perturbations and cross-domain scenarios, marking a notable advance in adaptive humanoid locomotion and control.

Technology Category

Application Category

📝 Abstract
Humanoid robots can suffer significant performance drops under small changes in dynamics, task specifications, or environment setup. We propose HoRD, a two-stage learning framework for robust humanoid control under domain shift. First, we train a high-performance teacher policy via history-conditioned reinforcement learning, where the policy infers latent dynamics context from recent state--action trajectories to adapt online to diverse randomized dynamics. Second, we perform online distillation to transfer the teacher's robust control capabilities into a transformer-based student policy that operates on sparse root-relative 3D joint keypoint trajectories. By combining history-conditioned adaptation with online distillation, HoRD enables a single policy to adapt zero-shot to unseen domains without per-domain retraining. Extensive experiments show HoRD outperforms strong baselines in robustness and transfer, especially under unseen domains and external perturbations. Code and project page are available at https://tonywang-0517.github.io/hord/.
Problem

Research questions and friction points this paper is trying to address.

humanoid control
domain shift
robustness
dynamics adaptation
zero-shot transfer
Innovation

Methods, ideas, or system contributions that make the work stand out.

history-conditioned reinforcement learning
online distillation
robust humanoid control
domain generalization
transformer-based policy
🔎 Similar Papers
No similar papers found.