Robot Trains Robot: Automatic Real-World Policy Adaptation and Learning for Humanoids

📅 2025-08-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Real-world humanoid reinforcement learning (RL) faces critical bottlenecks: high safety risks, challenging reward design, low sample efficiency, and frequent human intervention. To address these, we propose RTR (Robot-Teacher-Robot), a novel framework leveraging a robotic arm as a “teacher” to provide real-time physical safeguarding, adaptive perturbations, failure detection, autonomous reset, and dynamic curriculum guidance for a humanoid “student.” The framework integrates latent-variable-driven automatic reward generation and simulation-to-reality transfer. To our knowledge, this is the first method enabling fully autonomous, arm-guided humanoid RL—from scratch learning to policy fine-tuning—without manual supervision. Experiments demonstrate efficient real-world execution of swing-leg lifting (zero-shot learning) and precise velocity-controlled walking (policy refinement), significantly improving training safety, autonomy, and generalization. RTR overcomes longstanding barriers to sustained, unsupervised humanoid RL deployment in physical environments.

Technology Category

Application Category

📝 Abstract
Simulation-based reinforcement learning (RL) has significantly advanced humanoid locomotion tasks, yet direct real-world RL from scratch or adapting from pretrained policies remains rare, limiting the full potential of humanoid robots. Real-world learning, despite being crucial for overcoming the sim-to-real gap, faces substantial challenges related to safety, reward design, and learning efficiency. To address these limitations, we propose Robot-Trains-Robot (RTR), a novel framework where a robotic arm teacher actively supports and guides a humanoid robot student. The RTR system provides protection, learning schedule, reward, perturbation, failure detection, and automatic resets. It enables efficient long-term real-world humanoid training with minimal human intervention. Furthermore, we propose a novel RL pipeline that facilitates and stabilizes sim-to-real transfer by optimizing a single dynamics-encoded latent variable in the real world. We validate our method through two challenging real-world humanoid tasks: fine-tuning a walking policy for precise speed tracking and learning a humanoid swing-up task from scratch, illustrating the promising capabilities of real-world humanoid learning realized by RTR-style systems. See https://robot-trains-robot.github.io/ for more info.
Problem

Research questions and friction points this paper is trying to address.

Overcoming sim-to-real gap in humanoid robot learning
Addressing safety and efficiency in real-world RL training
Enabling autonomous humanoid policy adaptation with robotic guidance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Robotic arm teacher guides humanoid student
Optimizes dynamics-encoded latent variable transfer
Enables long-term real-world training autonomously
🔎 Similar Papers
No similar papers found.