🤖 AI Summary
Existing motion imitation approaches for humanoid robots often generate physically infeasible control commands in out-of-distribution environments due to the absence of explicit dynamic reasoning. This work proposes a hybrid control framework that integrates reinforcement learning with a centroidal dynamics model, eliminating the need for predefined contact schedules. The policy dynamically modulates a centroidal momentum-based controller while simultaneously predicting continuous contact states and desired centroidal velocities. A physics-informed reward function guides the generation of feasible feedforward joint torques. Experiments on the Booster T1 humanoid demonstrate that the proposed method reduces the average base position tracking error by 13% compared to state-of-the-art reinforcement learning baselines, significantly enhancing robustness and motion feasibility under domain shifts.
📝 Abstract
Motion mimicking, i.e., encouraging the control policy to mimic human motion, facilitates the learning of complex tasks via reinforcement learning (RL) for humanoid robots. Although standard RL frameworks demonstrate impressive locomotion agility, they often bypass explicit reasoning about robot dynamics during deployment, which is a design choice that can lead to physically infeasible commands when the robot encounters out-of-distribution environments. By integrating model-based principles, hybrid approaches can improve performance; however, existing methods typically rely on predefined contact timing, limiting their versatility. This paper introduces HybridMimic, a framework in which a learned policy dynamically modulates a centroidal-model-based controller by predicting continuous contact states and desired centroidal velocities. This architecture exploits the physical grounding of centroidal dynamics to generate feedforward torques that remain feasible even under domain shift. Using physics-informed rewards, the policy is trained to efficiently utilize the centroidal controller's optimization by outputting precise control targets and reference torques. Through hardware experiments on the Booster T1 humanoid, HybridMimic reduces the average base position tracking error by 13\% compared to a state-of-the-art RL baseline, demonstrating the robustness of dynamics-aware deployment.