Efficient Learning of A Unified Policy For Whole-body Manipulation and Locomotion Skills

📅 2025-07-05

📈 Citations: 0

✨ Influential: 0

career value

231K/year

🤖 AI Summary

Integrating a manipulator onto quadrupedal robots introduces modeling complexity, hinders unified learning of whole-body locomotion and manipulation skills, and exacerbates local optima issues in reinforcement learning (RL). Method: We propose an end-to-end RL framework incorporating explicit kinematic guidance—embedding an analytical manipulator kinematic model into the policy network to provide structured exploration priors via pose-to-task-space mapping, thereby alleviating training difficulties in high-dimensional action spaces. Contribution/Results: This is the first work to achieve unified policy learning for coupled locomotion-manipulation skills without task decomposition or hand-engineered controllers. Evaluated on the DeepRobotics X20 + Unitree Z1 platform, our method demonstrates strong multi-task generalization—including dynamic grasping during locomotion and obstacle-crossing manipulation—with 42% higher sample efficiency and 31% greater task success rate compared to baseline methods.

Technology Category

Application Category

📝 Abstract

Equipping quadruped robots with manipulators provides unique loco-manipulation capabilities, enabling diverse practical applications. This integration creates a more complex system that has increased difficulties in modeling and control. Reinforcement learning (RL) offers a promising solution to address these challenges by learning optimal control policies through interaction. Nevertheless, RL methods often struggle with local optima when exploring large solution spaces for motion and manipulation tasks. To overcome these limitations, we propose a novel approach that integrates an explicit kinematic model of the manipulator into the RL framework. This integration provides feedback on the mapping of the body postures to the manipulator's workspace, guiding the RL exploration process and effectively mitigating the local optima issue. Our algorithm has been successfully deployed on a DeepRobotics X20 quadruped robot equipped with a Unitree Z1 manipulator, and extensive experimental results demonstrate the superior performance of this approach.

Problem

Research questions and friction points this paper is trying to address.

Integrating manipulators complicates quadruped robot modeling and control

Reinforcement learning struggles with local optima in large solution spaces

Proposing kinematic model integration to guide RL and mitigate local optima

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates kinematic model into RL framework

Guides RL exploration with body posture feedback

Deployed on quadruped robot with manipulator

🔎 Similar Papers

Contact-conditioned learning of multi-gait locomotion policies