🤖 AI Summary
Existing sim-to-real transfer methods predominantly rely on fixed-parameter domain randomization, which struggles to encompass complex and unseen reality gaps, resulting in poor generalization of bipedal locomotion policies. To address this, we propose State-Dependent Torque Perturbation (SDTP), a novel technique that injects dynamic, state-conditioned disturbances directly into the torque space during simulation training—thereby explicitly modeling nonlinear physical discrepancies present in real-world environments. SDTP represents the first approach to shift perturbation from conventional parameter domains to the torque domain, enabling robust policy learning via reinforcement learning integrated with forward dynamics simulation. We rigorously evaluate its out-of-distribution generalization capability. Experiments demonstrate that SDTP significantly improves transfer performance on real humanoid robots, achieving stable walking under diverse, unseen deviations in mass, friction, and actuator latency—outperforming standard domain randomization baselines in robustness.
📝 Abstract
This paper proposes a novel alternative to existing sim-to-real methods for training control policies with simulated experiences. Prior sim-to-real methods for legged robots mostly rely on the domain randomization approach, where a fixed finite set of simulation parameters is randomized during training. Instead, our method adds state-dependent perturbations to the input joint torque used for forward simulation during the training phase. These state-dependent perturbations are designed to simulate a broader range of reality gaps than those captured by randomizing a fixed set of simulation parameters. Experimental results show that our method enables humanoid locomotion policies that achieve greater robustness against complex reality gaps unseen in the training domain.