Learning Sim-to-Real Humanoid Locomotion in 15 Minutes

📅 2025-12-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing challenges in sim-to-real locomotion control for humanoid robots—including high-dimensional action spaces, substantial domain gaps, and prohibitively long training times—this paper proposes a lightweight, end-to-end reinforcement learning framework. Our method enhances off-policy RL (FastSAC/FastTD3) stability and convergence speed under aggressive domain randomization via streamlined reward shaping, critical hyperparameter tuning, and massively parallel simulation. Training completes in just 15 minutes on a single RTX 4090 GPU, yielding robust gait transfer across complex terrains and under external disturbances. The approach is successfully deployed on Unitree G1 and Booster T1 humanoid platforms, enabling full-body motion tracking and rapid policy generalization. By drastically reducing computational and hardware requirements while maintaining real-time performance, our framework establishes a scalable new paradigm for resource-constrained, high-fidelity humanoid control.

Technology Category

Application Category

📝 Abstract
Massively parallel simulation has reduced reinforcement learning (RL) training time for robots from days to minutes. However, achieving fast and reliable sim-to-real RL for humanoid control remains difficult due to the challenges introduced by factors such as high dimensionality and domain randomization. In this work, we introduce a simple and practical recipe based on off-policy RL algorithms, i.e., FastSAC and FastTD3, that enables rapid training of humanoid locomotion policies in just 15 minutes with a single RTX 4090 GPU. Our simple recipe stabilizes off-policy RL algorithms at massive scale with thousands of parallel environments through carefully tuned design choices and minimalist reward functions. We demonstrate rapid end-to-end learning of humanoid locomotion controllers on Unitree G1 and Booster T1 robots under strong domain randomization, e.g., randomized dynamics, rough terrain, and push perturbations, as well as fast training of whole-body human-motion tracking policies. We provide videos and open-source implementation at: https://younggyo.me/fastsac-humanoid.
Problem

Research questions and friction points this paper is trying to address.

Enables rapid humanoid locomotion training in 15 minutes
Stabilizes off-policy RL at scale with minimalist rewards
Learns robust controllers under strong domain randomization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilizes FastSAC and FastTD3 off-policy RL algorithms
Stabilizes training with thousands of parallel environments
Employs minimalist reward functions and tuned design choices
🔎 Similar Papers
No similar papers found.