HoRD: Robust Humanoid Control via History-Conditioned Reinforcement Learning and Online Distillation

📅 2026-02-04

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

Humanoid robots often exhibit limited generalization when confronted with minor variations in dynamics, tasks, or environments. To address this challenge, this work proposes HoRD, a two-stage framework that first trains a teacher policy via history-conditioned reinforcement learning to acquire online adaptation capabilities, and then transfers this adaptability to a Transformer-based student policy through online knowledge distillation. This approach achieves, for the first time, strong zero-shot generalization of a single humanoid control policy across unseen domains, significantly outperforming existing baselines. The resulting policy demonstrates exceptional robustness and transferability under unknown perturbations and cross-domain scenarios, marking a notable advance in adaptive humanoid locomotion and control.

Technology Category

Application Category

📝 Abstract

Humanoid robots can suffer significant performance drops under small changes in dynamics, task specifications, or environment setup. We propose HoRD, a two-stage learning framework for robust humanoid control under domain shift. First, we train a high-performance teacher policy via history-conditioned reinforcement learning, where the policy infers latent dynamics context from recent state--action trajectories to adapt online to diverse randomized dynamics. Second, we perform online distillation to transfer the teacher's robust control capabilities into a transformer-based student policy that operates on sparse root-relative 3D joint keypoint trajectories. By combining history-conditioned adaptation with online distillation, HoRD enables a single policy to adapt zero-shot to unseen domains without per-domain retraining. Extensive experiments show HoRD outperforms strong baselines in robustness and transfer, especially under unseen domains and external perturbations. Code and project page are available at https://tonywang-0517.github.io/hord/.

Problem

Research questions and friction points this paper is trying to address.

humanoid control

domain shift

robustness

dynamics adaptation

zero-shot transfer

Innovation

Methods, ideas, or system contributions that make the work stand out.

history-conditioned reinforcement learning

online distillation

robust humanoid control

domain generalization

transformer-based policy

🔎 Similar Papers

Omnigrasp: Grasping Diverse Objects with Simulated Humanoids

2024-07-16Neural Information Processing SystemsCitations: 16