APEX: Action Priors Enable Efficient Exploration for Skill Imitation on Articulated Robots

📅 2025-05-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing imitation learning methods (e.g., AMP) suffer from mode collapse, hindering sim-to-real transfer and limiting behavioral diversity. To address this, we propose APEX—a novel framework that embeds expert action priors into PPO-based reinforcement learning. APEX introduces a decaying action-guidance mechanism and a multi-discriminator co-optimization architecture, jointly enhancing exploration while preserving stylistic consistency with expert demonstrations. Crucially, APEX requires only flat-ground locomotion data yet achieves style-preserving generalization to complex terrains—including stairs and uneven surfaces. Evaluated on the Unitree Go2 quadruped, APEX attains a peak real-world speed of 3.3 m/s, supports diverse agile gaits, and enables adaptive gait switching. It significantly reduces the sim-to-real performance gap, demonstrating superior robustness and generalizability. APEX establishes a new paradigm for efficient, robust, and generalizable imitation learning in embodied intelligence.

Technology Category

Application Category

📝 Abstract
Learning by imitation provides an effective way for robots to develop well-regulated complex behaviors and directly benefit from natural demonstrations. State-of-the-art imitation learning (IL) approaches typically leverage Adversarial Motion Priors (AMP), which, despite their impressive results, suffer from two key limitations. They are prone to mode collapse, which often leads to overfitting to the simulation environment and thus increased sim-to-real gap, and they struggle to learn diverse behaviors effectively. To overcome these limitations, we introduce APEX (Action Priors enable Efficient eXploration): a simple yet versatile imitation learning framework that integrates demonstrations directly into reinforcement learning (RL), maintaining high exploration while grounding behavior with expert-informed priors. We achieve this through a combination of decaying action priors, which initially bias exploration toward expert demonstrations but gradually allow the policy to explore independently. This is complemented by a multi-critic RL framework that effectively balances stylistic consistency with task performance. Our approach achieves sample-efficient imitation learning and enables the acquisition of diverse skills within a single policy. APEX generalizes to varying velocities and preserves reference-like styles across complex tasks such as navigating rough terrain and climbing stairs, utilizing only flat-terrain kinematic motion data as a prior. We validate our framework through extensive hardware experiments on the Unitree Go2 quadruped. There, APEX yields diverse and agile locomotion gaits, inherent gait transitions, and the highest reported speed for the platform to the best of our knowledge (peak velocity of ~3.3 m/s on hardware). Our results establish APEX as a compelling alternative to existing IL methods, offering better efficiency, adaptability, and real-world performance.
Problem

Research questions and friction points this paper is trying to address.

Overcoming mode collapse in imitation learning for robots
Enhancing diverse behavior acquisition in robotic imitation
Reducing sim-to-real gap in imitation learning frameworks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates demonstrations into reinforcement learning
Uses decaying action priors for exploration
Employs multi-critic RL for style-task balance
🔎 Similar Papers
No similar papers found.