🤖 AI Summary
Whole-body control (WBC) policies for humanoid robots trained in a single simulator suffer from inherent inductive biases, leading to substantial performance degradation during sim-to-real transfer. To address this, we propose PolySim—a novel framework enabling parallel, cooperative training across multiple heterogeneous simulators (e.g., MuJoCo, IsaacSim). PolySim systematically mitigates simulator-specific bias through dynamics-level cross-simulator domain randomization, integrating multi-engine ensemble learning, parallel dynamics randomization, and zero-shot transfer learning. Experiments demonstrate significantly reduced inter-simulator motion tracking error; a 52.8× improvement in task success rate over an IsaacSim-only baseline; and zero-shot deployment on the Unitree G1 robot without real-world fine-tuning. This work establishes a new paradigm for achieving robust generalization and efficient sim-to-real transfer in WBC policy learning.
📝 Abstract
Humanoid whole-body control (WBC) policies trained in simulation often suffer from the sim-to-real gap, which fundamentally arises from simulator inductive bias, the inherent assumptions and limitations of any single simulator. These biases lead to nontrivial discrepancies both across simulators and between simulation and the real world. To mitigate the effect of simulator inductive bias, the key idea is to train policies jointly across multiple simulators, encouraging the learned controller to capture dynamics that generalize beyond any single simulator's assumptions. We thus introduce PolySim, a WBC training platform that integrates multiple heterogeneous simulators. PolySim can launch parallel environments from different engines simultaneously within a single training run, thereby realizing dynamics-level domain randomization. Theoretically, we show that PolySim yields a tighter upper bound on simulator inductive bias than single-simulator training. In experiments, PolySim substantially reduces motion-tracking error in sim-to-sim evaluations; for example, on MuJoCo, it improves execution success by 52.8 over an IsaacSim baseline. PolySim further enables zero-shot deployment on a real Unitree G1 without additional fine-tuning, showing effective transfer from simulation to the real world. We will release the PolySim code upon acceptance of this work.