🤖 AI Summary
This work addresses the challenge of scaling curriculum learning in complex and diverse terrain task spaces, where traditional approaches struggle due to the absence of an explicit difficulty structure. The authors propose LP-ACRL, a learning-progress-based automatic curriculum reinforcement learning framework that adaptively shapes the task sampling distribution by online estimation of the agent’s learning progress, eliminating the need for predefined task difficulty. This approach achieves, for the first time, fully automatic curriculum generation without prior knowledge of task hardness, demonstrating strong scalability across a broad robotic task space. Experiments show that an ANYmal D quadruped robot equipped with LP-ACRL achieves stable high-speed locomotion—reaching linear speeds of 2.5 m/s and angular speeds of 3.0 rad/s—across varied terrains including stairs, slopes, rubble, and low-friction flat ground, substantially surpassing the performance limits of existing methods.
📝 Abstract
Curriculum learning has demonstrated substantial effectiveness in robot learning. However, it still faces limitations when scaling to complex, wide-ranging task spaces. Such task spaces often lack a well-defined difficulty structure, making the difficulty ordering required by previous methods challenging to define. We propose a Learning Progress-based Automatic Curriculum Reinforcement Learning (LP-ACRL) framework, which estimates the agent's learning progress online and adaptively adjusts the task-sampling distribution, thereby enabling automatic curriculum generation without prior knowledge of the difficulty distribution over the task space. Policies trained with LP-ACRL enable the ANYmal D quadruped to achieve and maintain stable, high-speed locomotion at 2.5 m/s linear velocity and 3.0 rad/s angular velocity across diverse terrains, including stairs, slopes, gravel, and low-friction flat surfaces--whereas previous methods have generally been limited to high speeds on flat terrain or low speeds on complex terrain. Experimental results demonstrate that LP-ACRL exhibits strong scalability and real-world applicability, providing a robust baseline for future research on curriculum generation in complex, wide-ranging robotic learning task spaces.