🤖 AI Summary
This work addresses the challenge of transferring reinforcement learning policies to humanoid robots carrying unknown payloads, where model mismatch between simulation and reality hinders effective deployment. The authors propose a two-stage, gradient-based system identification framework that uniquely integrates a differentiable simulator (MuJoCo XLA) with structured bias modeling: first calibrating the robot’s intrinsic dynamics and then identifying the mass distribution of the unknown payload. By explicitly eliminating the structured model bias induced by added loads prior to policy training, the method enables zero-shot policy transfer from simulation to reality. Experimental results demonstrate that the proposed framework significantly improves parameter identification accuracy, motion tracking performance, agility, and robustness in both simulated and real-world environments.
📝 Abstract
Humanoid robots deployed in real-world scenarios often need to carry unknown payloads, which introduce significant mismatch and degrade the effectiveness of simulation-to-reality reinforcement learning methods. To address this challenge, we propose a two-stage gradient-based system identification framework built on the differentiable simulator MuJoCo XLA. The first stage calibrates the nominal robot model using real-world data to reduce intrinsic sim-to-real discrepancies, while the second stage further identifies the mass distribution of the unknown payload. By explicitly reducing structured model bias prior to policy training, our approach enables zero-shot transfer of reinforcement learning policies to hardware under heavy-load conditions. Extensive simulation and real-world experiments demonstrate more precise parameter identification, improved motion tracking accuracy, and substantially enhanced agility and robustness compared to existing baselines. Project Page: https://mwondering.github.io/halo-humanoid/