🤖 AI Summary
Quadrupedal robots face challenges in complex terrains, including poor adaptability of reinforcement learning (RL) controllers due to heavy reliance on hand-crafted reward engineering and limited generalization of motion imitation methods.
Method: This paper proposes a hierarchical RL framework: a low-level policy pre-trained on flat terrain to imitate animal locomotion—establishing a robust motion prior—and a high-level policy that leverages proprioceptive and environmental observations to learn goal-directed residual corrections for obstacle avoidance and navigation.
Contribution/Results: It is the first work to decouple motion priors from generalizable residual control, thereby circumventing both the reward-design bottleneck and the generalization limitations of pure imitation learning. Integrated with sim-to-real transfer, the framework achieves stable traversal of multi-level irregular terrains in simulation. Real-world experiments on the ANYmal-D platform demonstrate smooth navigation and efficient obstacle negotiation in dense cluttered environments.
📝 Abstract
Reinforcement learning (RL)-based legged locomotion controllers often require meticulous reward tuning to track velocities or goal positions while preserving smooth motion on various terrains. Motion imitation methods via RL using demonstration data reduce reward engineering but fail to generalize to novel environments. We address this by proposing a hierarchical RL framework in which a low-level policy is first pre-trained to imitate animal motions on flat ground, thereby establishing motion priors. A subsequent high-level, goal-conditioned policy then builds on these priors, learning residual corrections that enable perceptive locomotion, local obstacle avoidance, and goal-directed navigation across diverse and rugged terrains. Simulation experiments illustrate the effectiveness of learned residuals in adapting to progressively challenging uneven terrains while still preserving the locomotion characteristics provided by the motion priors. Furthermore, our results demonstrate improvements in motion regularization over baseline models trained without motion priors under similar reward setups. Real-world experiments with an ANYmal-D quadruped robot confirm our policy's capability to generalize animal-like locomotion skills to complex terrains, demonstrating smooth and efficient locomotion and local navigation performance amidst challenging terrains with obstacles.