🤖 AI Summary
This work addresses the challenge of achieving stable and smooth transitions among diverse motor skills—such as walking, running, and fighting—in humanoid robots during prolonged dynamic interactions. Existing approaches often suffer from transition instability due to mismatches between skill states. To overcome this, the authors propose RPG, a Mixture-of-Experts policy framework that uniquely integrates motion transition randomization with temporal randomization. This enables unified policy training for arbitrary-duration execution, on-demand interruption, and seamless switching among skills. The method substantially enhances transition smoothness and overall system stability, and has been successfully deployed on the Unitree G1 humanoid robot, demonstrating robust performance and practical applicability in both simulation and real-world experiments.
📝 Abstract
Humanoid robots have demonstrated impressive motor skills in a wide range of tasks, yet whole-body control for humanlike long-time, dynamic fighting remains particularly challenging due to the stringent requirements on agility and stability. While imitation learning enables robots to execute human-like fighting skills, existing approaches often rely on switching among multiple single-skill policies or employing a general policy to imitate input reference motions. These strategies suffer from instability when transitioning between skills, as the mismatch of initial and terminal states across skills or reference motions introduces out-of-domain disturbances, resulting in unsmooth or unstable behaviors. In this work, we propose RPG, a hybrid expert policy framework, for smooth and stable humanoid multi-skills transition. Our approach incorporates motion transition randomization and temporal randomization to train a unified policy that generates agile fighting actions with stability and smoothness during skill transitions. Furthermore, we design a control pipeline that integrates walking/running locomotion with fighting skills, allowing humanlike long-time combat of arbitrary duration that can be seamlessly interrupted or transit action policies at any time. Extensive experiments in simulation demonstrate the effectiveness of the proposed framework, and real-world deployment on the Unitree G1 humanoid robot further validates its robustness and applicability.