🤖 AI Summary
Addressing the challenge of adaptive locomotion for humanoid robots under complex terrains and high-speed commands, this paper proposes an “imitation pretraining + reinforcement fine-tuning” framework. First, behavior cloning is performed via model predictive control (MPC) for pretraining; subsequently, proximal policy optimization (PPO) is employed for fine-tuning. A key innovation is model-assumption-driven adaptive regularization (MAR), which selectively enforces policy alignment with the MPC expert only when the dynamics model’s confidence exceeds a threshold—thereby mitigating catastrophic forgetting and ensuring safe knowledge transfer. The method integrates state-dependent confidence filtering, MPC-based imitation learning, and MAR regularization. Evaluated on the Digit robot, it achieves forward running at 1.5 m/s and demonstrates robust locomotion across diverse challenging terrains—including slippery surfaces, inclined slopes, uneven ground, and sandy terrain.
📝 Abstract
Humanoid locomotion is a challenging task due to its inherent complexity and high-dimensional dynamics, as well as the need to adapt to diverse and unpredictable environments. In this work, we introduce a novel learning framework for effectively training a humanoid locomotion policy that imitates the behavior of a model-based controller while extending its capabilities to handle more complex locomotion tasks, such as more challenging terrain and higher velocity commands. Our framework consists of three key components: pre-training through imitation of the model-based controller, fine-tuning via reinforcement learning, and model-assumption-based regularization (MAR) during fine-tuning. In particular, MAR aligns the policy with actions from the model-based controller only in states where the model assumption holds to prevent catastrophic forgetting. We evaluate the proposed framework through comprehensive simulation tests and hardware experiments on a full-size humanoid robot, Digit, demonstrating a forward speed of 1.5 m/s and robust locomotion across diverse terrains, including slippery, sloped, uneven, and sandy terrains.