🤖 AI Summary
Existing reinforcement learning approaches for robotics rarely exploit dynamical properties as inductive biases, limiting sample efficiency and generalization. This work proposes a dynamics-aware graph neural network architecture that, for the first time, integrates the tree-structured inertia propagation mechanism from articulated body algorithms into the policy network. By replacing physical quantities with learnable parameters, the method constructs a structured dynamical prior that explicitly embeds dynamics information into the policy executor. Evaluated on simulated humanoid, quadruped, and monopedal hopping robots, the approach substantially improves sample efficiency and cross-dynamics transferability. Furthermore, it enables efficient, real-time sim-to-real dynamic locomotion on physical platforms, including the Unitree G1 and Go2 robots.
📝 Abstract
Recent work in reinforcement learning has shown that incorporating structural priors for articulated robots, such as link connectivity, into policy networks improves learning efficiency. However, dynamics properties, despite their fundamental role in determining how forces and motion propagate through the body, remain largely underexplored as an inductive bias for policy learning. To address this gap, we present the Articulated-Body Dynamics Network (ABD-Net), a novel graph neural network architecture grounded in the computational structure of forward dynamics. Specifically, we adapt the inertia propagation mechanism from the Articulated Body Algorithm, systematically aggregating inertial quantities from child to parent links in a tree-structured manner, while replacing physical quantities with learnable parameters. Embedding ABD-NET into the policy actor enables dynamics-informed representations that capture how actions propagate through the body, leading to efficient and robust policy learning. Through experiments with simulated humanoid, quadruped, and hopper robots, our approach demonstrates increased sample efficiency and generalization to dynamics shifts compared to transformer-based and GNN baselines. We further validate the learned policy on real Unitree G1 and Go2 robots, state-of-the-art humanoid and quadruped platforms, generating dynamic, versatile and robust locomotion behaviors through sim-to-real transfer with real-time inference.