Learn to Teach: Sample-Efficient Privileged Learning for Humanoid Locomotion over Diverse Terrains

📅 2024-02-09

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

To address the low sample efficiency and poor robustness of sim-to-real transfer for humanoid robots operating across diverse terrains, this paper proposes a single-stage “Learning to Teach” (L2T) framework that jointly optimizes a privileged teacher policy (guided by privileged information) and an unprivileged student policy (relying solely on onboard sensing), enabling dynamic trajectory reuse and efficient simulation-sample sharing. Key contributions include: (1) the first end-to-end, single-stage co-training paradigm for teacher–student policy optimization; (2) zero-shot sim-to-real transfer without depth estimation or real-world fine-tuning; and (3) significantly reduced sample complexity and training time. Evaluated on the Digit robot across 12 complex terrains, the method achieves state-of-the-art robust locomotion performance in both simulation and hardware experiments, empirically validating effective cross-domain generalization.

Technology Category

Application Category

📝 Abstract

Humanoid robots promise transformative capabilities for industrial and service applications. While recent advances in Reinforcement Learning (RL) yield impressive results in locomotion, manipulation, and navigation, the proposed methods typically require enormous simulation samples to account for real-world variability. This work proposes a novel one-stage training framework-Learn to Teach (L2T)-which unifies teacher and student policy learning. Our approach recycles simulator samples and synchronizes the learning trajectories through shared dynamics, significantly reducing sample complexities and training time while achieving state-of-the-art performance. Furthermore, we validate the RL variant (L2T-RL) through extensive simulations and hardware tests on the Digit robot, demonstrating zero-shot sim-to-real transfer and robust performance over 12+ challenging terrains without depth estimation modules.

Problem

Research questions and friction points this paper is trying to address.

Reduces sample complexity in humanoid robot locomotion training

Achieves zero-shot sim-to-real transfer without depth estimation

Enables robust performance across diverse terrains efficiently

Innovation

Methods, ideas, or system contributions that make the work stand out.

One-stage training framework unifies teacher-student learning

Recycles simulator samples to reduce training complexity

Achieves zero-shot sim-to-real transfer without depth estimation

🔎 Similar Papers

No similar papers found.