🤖 AI Summary
To address the limited adaptive locomotion capability of quadrupedal robots under unknown payloads and diverse terrains (e.g., flat ground, slopes, stairs), this paper proposes a hierarchical adaptive reinforcement learning framework: a high-level policy learns baseline locomotion behaviors, while a low-level controller dynamically compensates for payload-induced and terrain-induced disturbances in real time. This work achieves, for the first time, end-to-end dual adaptation—simultaneously adapting to both payload variations and terrain changes—without relying on dynamic models, predefined gaits, or manual hyperparameter tuning. The approach integrates large-scale simulation training in Isaac Gym, an adaptive RL policy network, and a model predictive control (MPC) co-architecture, and is validated via hardware deployment on the Unitree Go1 platform. Experiments demonstrate substantial improvements in body height and velocity tracking accuracy, along with superior robustness against abrupt static and dynamic payload changes compared to state-of-the-art MPC-based methods.
📝 Abstract
Quadrupedal robots are increasingly deployed for load-carrying tasks across diverse terrains. While Model Predictive Control (MPC)-based methods can account for payload variations, they often depend on predefined gait schedules or trajectory generators, limiting their adaptability in unstructured environments. To address these limitations, we propose an Adaptive Reinforcement Learning (RL) framework that enables quadrupedal robots to dynamically adapt to both varying payloads and diverse terrains. The framework consists of a nominal policy responsible for baseline locomotion and an adaptive policy that learns corrective actions to preserve stability and improve command tracking under payload variations. We validate the proposed approach through large-scale simulation experiments in Isaac Gym and real-world hardware deployment on a Unitree Go1 quadruped. The controller was tested on flat ground, slopes, and stairs under both static and dynamic payload changes. Across all settings, our adaptive controller consistently outperformed the controller in tracking body height and velocity commands, demonstrating enhanced robustness and adaptability without requiring explicit gait design or manual tuning.