🤖 AI Summary
Real-world humanoid reinforcement learning (RL) faces critical bottlenecks: high safety risks, challenging reward design, low sample efficiency, and frequent human intervention. To address these, we propose RTR (Robot-Teacher-Robot), a novel framework leveraging a robotic arm as a “teacher” to provide real-time physical safeguarding, adaptive perturbations, failure detection, autonomous reset, and dynamic curriculum guidance for a humanoid “student.” The framework integrates latent-variable-driven automatic reward generation and simulation-to-reality transfer. To our knowledge, this is the first method enabling fully autonomous, arm-guided humanoid RL—from scratch learning to policy fine-tuning—without manual supervision. Experiments demonstrate efficient real-world execution of swing-leg lifting (zero-shot learning) and precise velocity-controlled walking (policy refinement), significantly improving training safety, autonomy, and generalization. RTR overcomes longstanding barriers to sustained, unsupervised humanoid RL deployment in physical environments.
📝 Abstract
Simulation-based reinforcement learning (RL) has significantly advanced humanoid locomotion tasks, yet direct real-world RL from scratch or adapting from pretrained policies remains rare, limiting the full potential of humanoid robots. Real-world learning, despite being crucial for overcoming the sim-to-real gap, faces substantial challenges related to safety, reward design, and learning efficiency. To address these limitations, we propose Robot-Trains-Robot (RTR), a novel framework where a robotic arm teacher actively supports and guides a humanoid robot student. The RTR system provides protection, learning schedule, reward, perturbation, failure detection, and automatic resets. It enables efficient long-term real-world humanoid training with minimal human intervention. Furthermore, we propose a novel RL pipeline that facilitates and stabilizes sim-to-real transfer by optimizing a single dynamics-encoded latent variable in the real world. We validate our method through two challenging real-world humanoid tasks: fine-tuning a walking policy for precise speed tracking and learning a humanoid swing-up task from scratch, illustrating the promising capabilities of real-world humanoid learning realized by RTR-style systems. See https://robot-trains-robot.github.io/ for more info.