APEX: Action Priors Enable Efficient Exploration for Robust Motion Tracking on Legged Robots

📅 2025-11-12

📈 Citations: 0

✨ Influential: 0

career value

224K/year

🤖 AI Summary

Existing legged robot motion tracking methods rely heavily on predefined reference trajectories and extensive manual parameter tuning, limiting generalization and deployment efficiency. This paper proposes APEX, a plug-and-play reinforcement learning extension framework that incorporates expert demonstrations into training via a decaying action prior mechanism—eliminating the need for reference data during deployment. By integrating a multi-critic architecture with policy regularization, APEX enhances sample efficiency, robustness, and cross-terrain, cross-speed, and cross-gait generalization. Evaluated in simulation and on real-world Unitree Go2 hardware, APEX significantly improves training stability and motion tracking accuracy, while enabling reliable transfer under modified reward functions. The core innovations are (i) a dynamically decaying action prior that balances imitation and exploration, and (ii) a collaborative multi-critic constraint mechanism that stabilizes policy learning and improves trajectory fidelity.

Technology Category

Application Category

📝 Abstract

Learning natural, animal-like locomotion from demonstrations has become a core paradigm in legged robotics. Despite the recent advancements in motion tracking, most existing methods demand extensive tuning and rely on reference data during deployment, limiting adaptability. We present APEX (Action Priors enable Efficient Exploration), a plug-and-play extension to state-of-the-art motion tracking algorithms that eliminates any dependence on reference data during deployment, improves sample efficiency, and reduces parameter tuning effort. APEX integrates expert demonstrations directly into reinforcement learning (RL) by incorporating decaying action priors, which initially bias exploration toward expert demonstrations but gradually allow the policy to explore independently. This is combined with a multi-critic framework that balances task performance with motion style. Moreover, APEX enables a single policy to learn diverse motions and transfer reference-like styles across different terrains and velocities, while remaining robust to variations in reward design. We validate the effectiveness of our method through extensive experiments in both simulation and on a Unitree Go2 robot. By leveraging demonstrations to guide exploration during RL training, without imposing explicit bias toward them, APEX enables legged robots to learn with greater stability, efficiency, and generalization. We believe this approach paves the way for guidance-driven RL to boost natural skill acquisition in a wide array of robotic tasks, from locomotion to manipulation. Website and code: https://marmotlab.github.io/APEX/.

Problem

Research questions and friction points this paper is trying to address.

Eliminates dependence on reference data during robot deployment

Improves sample efficiency and reduces parameter tuning effort

Enables robust motion tracking across diverse terrains and velocities

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses decaying action priors for efficient exploration

Employs multi-critic framework balancing task and style

Enables single policy learning across diverse conditions

🔎 Similar Papers

No similar papers found.