Cost-Matching Model Predictive Control for Efficient Reinforcement Learning in Humanoid Locomotion

πŸ“… 2026-03-30
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the inefficiency and instability in reinforcement learning for humanoid robots caused by high-dimensional action spaces and model mismatch. To overcome these challenges, the authors propose a novel framework that integrates parameterized model predictive control (MPC) with reinforcement learning. By designing a cost-matching mechanism, the MPC-predicted cost directly approximates the true action-value function, enabling efficient gradient-based updates without repeatedly solving the MPC optimization online. The approach constructs a parameterized MPC based on centroidal dynamics and trains it end-to-end via gradient descent. Experimental results demonstrate that the method significantly outperforms hand-tuned baselines in simulation, exhibiting superior robustness, locomotion performance, and policy generalization under model mismatch and external disturbances.
πŸ“ Abstract
In this paper, we propose a cost-matching approach for optimal humanoid locomotion within a Model Predictive Control (MPC)-based Reinforcement Learning (RL) framework. A parameterized MPC formulation with centroidal dynamics is trained to approximate the action-value function obtained from high-fidelity closed-loop data. Specifically, the MPC cost-to-go is evaluated along recorded state-action trajectories, and the parameters are updated to minimize the discrepancy between MPC-predicted values and measured returns. This formulation enables efficient gradient-based learning while avoiding the computational burden of repeatedly solving the MPC problem during training. The proposed method is validated in simulation using a commercial humanoid platform. Results demonstrate improved locomotion performance and robustness to model mismatch and external disturbances compared with manually tuned baselines.
Problem

Research questions and friction points this paper is trying to address.

humanoid locomotion
reinforcement learning
model predictive control
cost-to-go
robustness
Innovation

Methods, ideas, or system contributions that make the work stand out.

cost-matching
Model Predictive Control
Reinforcement Learning
humanoid locomotion
centroidal dynamics
πŸ”Ž Similar Papers
No similar papers found.