π€ AI Summary
Addressing the challenge of balancing computational efficiency and motion accuracy in real-time safe navigation for humanoid robots, this paper proposes a dynamic subgoal-driven hierarchical navigation framework. At the high level, deep reinforcement learning (PPO/SAC) operates in the robotβs body-fixed frame to dynamically generate semantic subgoals; at the low level, model predictive control (MPC), integrated with kinematic modeling, synthesizes stable and robust gait trajectories. A novel model-guided data bootstrapping mechanism is introduced to significantly improve RL training efficiency and policy stability. Evaluated on the Digit robot simulation platform, the method achieves superior navigation success rates and generalization capability compared to both traditional model-based and state-of-the-art end-to-end learning approaches in complex cluttered environments. It simultaneously guarantees real-time performance (>30 Hz) and strong robustness against environmental uncertainties and dynamic disturbances.
π Abstract
Safe and real-time navigation is fundamental for humanoid robot applications. However, existing bipedal robot navigation frameworks often struggle to balance computational efficiency with the precision required for stable locomotion. We propose a novel hierarchical framework that continuously generates dynamic subgoals to guide the robot through cluttered environments. Our method comprises a high-level reinforcement learning (RL) planner for subgoal selection in a robot-centric coordinate system and a low-level Model Predictive Control (MPC) based planner which produces robust walking gaits to reach these subgoals. To expedite and stabilize the training process, we incorporate a data bootstrapping technique that leverages a model-based navigation approach to generate a diverse, informative dataset. We validate our method in simulation using the Agility Robotics Digit humanoid across multiple scenarios with random obstacles. Results show that our framework significantly improves navigation success rates and adaptability compared to both the original model-based method and other learning-based methods.