PreCi: Pretraining and Continual Improvement of Humanoid Locomotion via Model-Assumption-Based Regularization

📅 2025-04-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the challenge of adaptive locomotion for humanoid robots under complex terrains and high-speed commands, this paper proposes an “imitation pretraining + reinforcement fine-tuning” framework. First, behavior cloning is performed via model predictive control (MPC) for pretraining; subsequently, proximal policy optimization (PPO) is employed for fine-tuning. A key innovation is model-assumption-driven adaptive regularization (MAR), which selectively enforces policy alignment with the MPC expert only when the dynamics model’s confidence exceeds a threshold—thereby mitigating catastrophic forgetting and ensuring safe knowledge transfer. The method integrates state-dependent confidence filtering, MPC-based imitation learning, and MAR regularization. Evaluated on the Digit robot, it achieves forward running at 1.5 m/s and demonstrates robust locomotion across diverse challenging terrains—including slippery surfaces, inclined slopes, uneven ground, and sandy terrain.

Technology Category

Application Category

📝 Abstract
Humanoid locomotion is a challenging task due to its inherent complexity and high-dimensional dynamics, as well as the need to adapt to diverse and unpredictable environments. In this work, we introduce a novel learning framework for effectively training a humanoid locomotion policy that imitates the behavior of a model-based controller while extending its capabilities to handle more complex locomotion tasks, such as more challenging terrain and higher velocity commands. Our framework consists of three key components: pre-training through imitation of the model-based controller, fine-tuning via reinforcement learning, and model-assumption-based regularization (MAR) during fine-tuning. In particular, MAR aligns the policy with actions from the model-based controller only in states where the model assumption holds to prevent catastrophic forgetting. We evaluate the proposed framework through comprehensive simulation tests and hardware experiments on a full-size humanoid robot, Digit, demonstrating a forward speed of 1.5 m/s and robust locomotion across diverse terrains, including slippery, sloped, uneven, and sandy terrains.
Problem

Research questions and friction points this paper is trying to address.

Developing a framework for humanoid locomotion policy training
Adapting to diverse and unpredictable environments effectively
Preventing catastrophic forgetting during policy fine-tuning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Pretraining via model-based controller imitation
Fine-tuning with reinforcement learning
Model-assumption-based regularization prevents forgetting
🔎 Similar Papers
No similar papers found.