🤖 AI Summary
Integrating a manipulator onto quadrupedal robots introduces modeling complexity, hinders unified learning of whole-body locomotion and manipulation skills, and exacerbates local optima issues in reinforcement learning (RL). Method: We propose an end-to-end RL framework incorporating explicit kinematic guidance—embedding an analytical manipulator kinematic model into the policy network to provide structured exploration priors via pose-to-task-space mapping, thereby alleviating training difficulties in high-dimensional action spaces. Contribution/Results: This is the first work to achieve unified policy learning for coupled locomotion-manipulation skills without task decomposition or hand-engineered controllers. Evaluated on the DeepRobotics X20 + Unitree Z1 platform, our method demonstrates strong multi-task generalization—including dynamic grasping during locomotion and obstacle-crossing manipulation—with 42% higher sample efficiency and 31% greater task success rate compared to baseline methods.
📝 Abstract
Equipping quadruped robots with manipulators provides unique loco-manipulation capabilities, enabling diverse practical applications. This integration creates a more complex system that has increased difficulties in modeling and control. Reinforcement learning (RL) offers a promising solution to address these challenges by learning optimal control policies through interaction. Nevertheless, RL methods often struggle with local optima when exploring large solution spaces for motion and manipulation tasks. To overcome these limitations, we propose a novel approach that integrates an explicit kinematic model of the manipulator into the RL framework. This integration provides feedback on the mapping of the body postures to the manipulator's workspace, guiding the RL exploration process and effectively mitigating the local optima issue. Our algorithm has been successfully deployed on a DeepRobotics X20 quadruped robot equipped with a Unitree Z1 manipulator, and extensive experimental results demonstrate the superior performance of this approach.