Multi-Gait Learning for Humanoid Robots Using Reinforcement Learning with Selective Adversarial Motion Prior

📅 2026-04-21
📈 Citations: 0
Influential: 0
📄 PDF

career value

245K/year
🤖 AI Summary
This work addresses the challenge of enabling humanoid robots to master diverse gaits within a unified reinforcement learning framework, where conflicting requirements between stability and agility often hinder performance. The authors propose a selective Adversarial Motion Prior (AMP) mechanism that activates AMP only for highly periodic and stability-critical gaits—such as walking, marching, and stair climbing—to accelerate convergence and suppress aberrant behaviors, while deactivating AMP for high-dynamic maneuvers like running and jumping to preserve agility. Trained in simulation using PPO with domain randomization, the policy achieves zero-shot transfer to a 12-degree-of-freedom humanoid platform. Experiments demonstrate superior performance across five gaits compared to uniformly applying AMP: stability-oriented gaits exhibit faster convergence, lower tracking error, and higher success rates, without compromising the agility of dynamic motions.

Technology Category

Application Category

📝 Abstract
Learning diverse locomotion skills for humanoid robots in a unified reinforcement learning framework remains challenging due to the conflicting requirements of stability and dynamic expressiveness across different gaits. We present a multi-gait learning approach that enables a humanoid robot to master five distinct gaits -- walking, goose-stepping, running, stair climbing, and jumping -- using a consistent policy structure, action space, and reward formulation. The key contribution is a selective Adversarial Motion Prior (AMP) strategy: AMP is applied to periodic, stability-critical gaits (walking, goose-stepping, stair climbing) where it accelerates convergence and suppresses erratic behavior, while being deliberately omitted for highly dynamic gaits (running, jumping) where its regularization would over-constrain the motion. Policies are trained via PPO with domain randomization in simulation and deployed on a physical 12-DOF humanoid robot through zero-shot sim-to-real transfer. Quantitative comparisons demonstrate that selective AMP outperforms a uniform AMP policy across all five gaits, achieving faster convergence, lower tracking error, and higher success rates on stability-focused gaits without sacrificing the agility required for dynamic ones.
Problem

Research questions and friction points this paper is trying to address.

humanoid robots
multi-gait learning
reinforcement learning
stability
dynamic expressiveness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Selective AMP
multi-gait learning
humanoid locomotion
reinforcement learning
sim-to-real transfer