Jump-Start Reinforcement Learning with Self-Evolving Priors for Extreme Monopedal Locomotion

📅 2025-07-01

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

This paper addresses the challenge of training robust single-leg jumping policies for quadrupedal robots under extreme underactuation and unpredictable terrain using reinforcement learning. We propose JumpER, a progressive curriculum learning framework that eliminates hand-crafted reward functions and external expert priors. JumpER employs a three-stage self-evolving curriculum—progressing from action modality to observation space and finally to task objectives—augmented by self-guided policy distillation and dynamic prior generation. This enables autonomous policy evolution from simplicity to complexity. To our knowledge, JumpER is the first method to achieve highly robust single-leg jumping on quadrupeds, successfully traversing 60 cm-wide gaps, irregular staircases, and stepping stones with variable spacing (15–35 cm). Experiments demonstrate significantly improved training stability and cross-terrain generalization, overcoming key adaptability bottlenecks of conventional methods in extreme environments.

Technology Category

Application Category

📝 Abstract

Reinforcement learning (RL) has shown great potential in enabling quadruped robots to perform agile locomotion. However, directly training policies to simultaneously handle dual extreme challenges, i.e., extreme underactuation and extreme terrains, as in monopedal hopping tasks, remains highly challenging due to unstable early-stage interactions and unreliable reward feedback. To address this, we propose JumpER (jump-start reinforcement learning via self-evolving priors), an RL training framework that structures policy learning into multiple stages of increasing complexity. By dynamically generating self-evolving priors through iterative bootstrapping of previously learned policies, JumpER progressively refines and enhances guidance, thereby stabilizing exploration and policy optimization without relying on external expert priors or handcrafted reward shaping. Specifically, when integrated with a structured three-stage curriculum that incrementally evolves action modality, observation space, and task objective, JumpER enables quadruped robots to achieve robust monopedal hopping on unpredictable terrains for the first time. Remarkably, the resulting policy effectively handles challenging scenarios that traditional methods struggle to conquer, including wide gaps up to 60 cm, irregularly spaced stairs, and stepping stones with distances varying from 15 cm to 35 cm. JumpER thus provides a principled and scalable approach for addressing locomotion tasks under the dual challenges of extreme underactuation and extreme terrains.

Problem

Research questions and friction points this paper is trying to address.

Enables monopedal hopping on extreme terrains

Addresses extreme underactuation in locomotion tasks

Stabilizes RL training without expert priors

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-stage RL training with increasing complexity

Self-evolving priors via iterative policy bootstrapping

Structured curriculum for action and observation evolution

🔎 Similar Papers

No similar papers found.