๐ค AI Summary
Existing robot controllers exhibit poor generalization, requiring labor-intensive reward shaping, physical parameter tuning, and hyperparameter optimization for each humanoid platform, thereby hindering cross-morphology transfer.
Method: We propose H-Zero, the first pre-training framework for universal bipedal locomotion policies across diverse humanoid morphologies. It integrates deep reinforcement learning with cross-morphology policy distillation, trained jointly in a multi-robot simulation environment and fine-tuned with minimal real-world data.
Contribution/Results: H-Zero enables zero-shot or few-shot adaptationโnew morphologies can be deployed within 30 minutes. On unseen humanoid robots, it achieves an 81% gait-cycle retention rate. Moreover, it generalizes effectively to upright quadrupeds, demonstrating strong cross-locomotion adaptability. By drastically reducing reliance on hardware-specific tuning and expert intervention, H-Zero significantly lowers deployment overhead and parameter optimization costs.
๐ Abstract
The rapid advancement of humanoid robotics has intensified the need for robust and adaptable controllers to enable stable and efficient locomotion across diverse platforms. However, developing such controllers remains a significant challenge because existing solutions are tailored to specific robot designs, requiring extensive tuning of reward functions, physical parameters, and training hyperparameters for each embodiment. To address this challenge, we introduce H-Zero, a cross-humanoid locomotion pretraining pipeline that learns a generalizable humanoid base policy. We show that pretraining on a limited set of embodiments enables zero-shot and few-shot transfer to novel humanoid robots with minimal fine-tuning. Evaluations show that the pretrained policy maintains up to 81% of the full episode duration on unseen robots in simulation while enabling few-shot transfer to unseen humanoids and upright quadrupeds within 30 minutes of fine-tuning.