🤖 AI Summary
This work addresses the challenge of fall recovery for humanoid robots in unstructured environments, where existing reinforcement learning approaches often lack explicit modeling of balance dynamics. The authors propose a novel method that incorporates classical balance metrics—capture point, center-of-mass state, and centroidal momentum—as privileged inputs to the critic and leverages them to design shaping rewards. Relying solely on proprioceptive feedback, the approach achieves zero-shot transfer to real hardware. By embedding interpretable principles of balance control, the method learns a single, physically consistent recovery policy that generalizes across the full spectrum of disturbances—from minor perturbations to multi-contact falls. Evaluated on the Unitree H1-2 platform, the policy attains a 93.4% success rate in random fall recovery. Ablation studies confirm the critical role of the balance-aware architecture in policy learning, with successful demonstrations in both Sim-to-Sim transfer and preliminary real-world deployment.
📝 Abstract
Humanoid robots remain vulnerable to falls and unrecoverable failure states, limiting their practical utility in unstructured environments. While reinforcement learning has demonstrated stand-up behaviors, existing approaches treat recovery as a pure task-reward problem without an explicit representation of the balance state. We present a unified RL policy that addresses this limitation by embedding classical balance metrics: capture point, center-of-mass state, and centroidal momentum, as privileged critic inputs and shaping rewards directly around these quantities during training, while the actor relies solely on proprioception for zero-shot hardware transfer. Without reference trajectories or scripted contacts, a single policy spans the full recovery spectrum: ankle and hip strategies for small disturbances, corrective stepping under large pushes, and compliant falling with multi-contact stand-up using the hands, elbows, and knees. Trained on the Unitree H1-2 in Isaac Lab, the policy achieves a 93.4% recovery rate across randomized initial poses and unscripted fall configurations. An ablation study shows that removing the balance-informed structure causes stand-up learning to fail entirely, confirming that these metrics provide a meaningful learning signal rather than incidental structure. Sim-to-sim transfer to MuJoCo and preliminary hardware experiments further demonstrate cross-environment generalization. These results show that embedding interpretable balance structure into the learning framework substantially reduces time spent in failure states and broadens the envelope of autonomous recovery.