🤖 AI Summary
Legged robots suffer from inefficient reinforcement learning (RL) exploration, asymmetric behaviors, and poor generalization due to morphological symmetry. Method: This work introduces symmetry priors to guide model-free policy learning, systematically comparing strict equivariant network architectures against symmetry-aware data augmentation for legged control. The proposed approach integrates equivariant/invariant neural network design, symmetric data augmentation, and PPO/SAC algorithms, evaluated in locomotion-manipulation and bipedal walking simulation frameworks. Contribution/Results: Equivariant policies significantly improve sample efficiency (up to +40%), gait periodicity, and stability; enable zero-shot transfer to real-world quadrupedal and bipedal hardware platforms; and yield more natural, robust, and transferable policies. This study establishes equivariance as a critical inductive bias for RL-based control of symmetric robots—providing a novel paradigm that bridges geometric deep learning and legged robot autonomy.
📝 Abstract
Model-free reinforcement learning is a promising approach for autonomously solving challenging robotics control problems, but faces exploration difficulty without information about the robot’s morphology. The under-exploration of multiple modalities with symmetric states leads to behaviors that are often unnatural and sub-optimal. This issue becomes particularly pronounced in the context of robotic systems with morphological symmetries, such as legged robots for which the resulting asymmetric and aperiodic behaviors compromise performance, robustness, and transferability to real hardware. To mitigate this challenge, we can leverage symmetry to guide and improve the exploration in policy learning via equivariance / invariance constraints. We investigate the efficacy of two approaches to incorporate symmetry: modifying the network architectures to be strictly equivariant / invariant, and leveraging data augmentation to approximate equivariant / invariant actor-critics. We implement the methods on challenging loco-manipulation and bipedal locomotion tasks and compare with an unconstrained baseline. We find that the strictly equivariant policy consistently outperforms other methods in sample efficiency and task performance in simulation. Additionaly, symmetry-incorporated approaches exhibit better gait quality, higher robustness and can be deployed zero-shot to hardware.