SteadyTray: Learning Object Balancing Tasks in Humanoid Tray Transport via Residual Reinforcement Learning

📅 2026-03-11

📈 Citations: 0

✨ Influential: 0

career value

224K/year

🤖 AI Summary

This work addresses the challenge of unstable unsecured objects on a tray during dynamic bipedal walking of humanoid robots, caused by gait-induced oscillations. To this end, the authors propose ReST-RL, a hierarchical reinforcement learning architecture that decouples a low-level robust walking policy from a high-level residual perturbation suppression module, enabling high-precision tray balancing. This approach achieves smooth object transport without compromising bipedal stability and supports zero-shot transfer to real hardware. Experimental results demonstrate a 96.9% success rate in variable-speed trajectory tracking and 74.5% robustness against external disturbances in simulation. Furthermore, the method is successfully deployed in a zero-shot manner on the Unitree G1 humanoid robot, validating its practical efficacy.

Technology Category

Application Category

📝 Abstract

Stabilizing unsecured payloads against the inherent oscillations of dynamic bipedal locomotion remains a critical engineering bottleneck for humanoids in unstructured environments. To solve this, we introduce ReST-RL, a hierarchical reinforcement learning architecture that explicitly decouples locomotion from payload stabilization, evaluated via the SteadyTray benchmark. Rather than relying on monolithic end-to-end learning, our framework integrates a robust base locomotion policy with a dynamic residual module engineered to actively cancel gait-induced perturbations at the end-effector. This architectural separation ensures steady tray transport without degrading the underlying bipedal stability. In simulation, the residual design significantly outperforms end-to-end baselines in gait smoothness and orientation accuracy, achieving a 96.9% success rate in variable velocity tracking and 74.5% robustness against external force disturbances. Successfully deployed on the Unitree G1 humanoid hardware, this modular approach demonstrates highly reliable zero-shot sim-to-real generalization across various objects and external force disturbances.

Problem

Research questions and friction points this paper is trying to address.

humanoid

payload stabilization

bipedal locomotion

object balancing

dynamic perturbations

Innovation

Methods, ideas, or system contributions that make the work stand out.

residual reinforcement learning

hierarchical control

payload stabilization