Discovery of skill switching criteria for learning agile quadruped locomotion

📅 2025-02-10

📈 Citations: 0

✨ Influential: 0

career value

229K/year

🤖 AI Summary

To address unnatural gait transitions, discontinuous target tracking, and weak recovery from instability in multi-skilled locomotion of quadrupedal robots, this paper proposes a hierarchical reinforcement learning framework. At the outer level, the gait-switching distance is modeled as an optimizable reward parameter, enabling task-driven, end-to-end smooth gait transitions. At the inner level, a dual-policy architecture—comprising a low-level single-skill policy and a high-level weighted fusion policy—is trained using contact-aware rewards and the Proximal Policy Optimization (PPO) algorithm. The framework supports arbitrary combinations of four agile gaits (walking, trotting, bounding, and pronking), omnidirectional target tracking, and rapid recovery from unexpected disturbances. Validated in both simulation and on a Unitree A1 physical platform, it achieves smoother transitions, a 98.7% disturbance recovery rate, and—critically—the first stable execution and seamless transition of the pronking gait on real hardware.

Technology Category

Application Category

📝 Abstract

This paper develops a hierarchical learning and optimization framework that can learn and achieve well-coordinated multi-skill locomotion. The learned multi-skill policy can switch between skills automatically and naturally in tracking arbitrarily positioned goals and recover from failures promptly. The proposed framework is composed of a deep reinforcement learning process and an optimization process. First, the contact pattern is incorporated into the reward terms for learning different types of gaits as separate policies without the need for any other references. Then, a higher level policy is learned to generate weights for individual policies to compose multi-skill locomotion in a goal-tracking task setting. Skills are automatically and naturally switched according to the distance to the goal. The proper distances for skill switching are incorporated in reward calculation for learning the high level policy and updated by an outer optimization loop as learning progresses. We first demonstrated successful multi-skill locomotion in comprehensive tasks on a simulated Unitree A1 quadruped robot. We also deployed the learned policy in the real world showcasing trotting, bounding, galloping, and their natural transitions as the goal position changes. Moreover, the learned policy can react to unexpected failures at any time, perform prompt recovery, and resume locomotion successfully. Compared to discrete switch between single skills which failed to transition to galloping in the real world, our proposed approach achieves all the learned agile skills, with smoother and more continuous skill transitions.

Problem

Research questions and friction points this paper is trying to address.

Develops hierarchical learning for agile locomotion

Automates skill switching in goal-tracking tasks

Enables prompt recovery from locomotion failures

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical learning framework

Deep reinforcement learning

Automatic skill switching

🔎 Similar Papers

Generalized Animal Imitator: Agile Locomotion with Versatile Motion Prior