Efficient Learning of A Unified Policy For Whole-body Manipulation and Locomotion Skills

📅 2025-07-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Integrating a manipulator onto quadrupedal robots introduces modeling complexity, hinders unified learning of whole-body locomotion and manipulation skills, and exacerbates local optima issues in reinforcement learning (RL). Method: We propose an end-to-end RL framework incorporating explicit kinematic guidance—embedding an analytical manipulator kinematic model into the policy network to provide structured exploration priors via pose-to-task-space mapping, thereby alleviating training difficulties in high-dimensional action spaces. Contribution/Results: This is the first work to achieve unified policy learning for coupled locomotion-manipulation skills without task decomposition or hand-engineered controllers. Evaluated on the DeepRobotics X20 + Unitree Z1 platform, our method demonstrates strong multi-task generalization—including dynamic grasping during locomotion and obstacle-crossing manipulation—with 42% higher sample efficiency and 31% greater task success rate compared to baseline methods.

Technology Category

Application Category

📝 Abstract
Equipping quadruped robots with manipulators provides unique loco-manipulation capabilities, enabling diverse practical applications. This integration creates a more complex system that has increased difficulties in modeling and control. Reinforcement learning (RL) offers a promising solution to address these challenges by learning optimal control policies through interaction. Nevertheless, RL methods often struggle with local optima when exploring large solution spaces for motion and manipulation tasks. To overcome these limitations, we propose a novel approach that integrates an explicit kinematic model of the manipulator into the RL framework. This integration provides feedback on the mapping of the body postures to the manipulator's workspace, guiding the RL exploration process and effectively mitigating the local optima issue. Our algorithm has been successfully deployed on a DeepRobotics X20 quadruped robot equipped with a Unitree Z1 manipulator, and extensive experimental results demonstrate the superior performance of this approach.
Problem

Research questions and friction points this paper is trying to address.

Integrating manipulators complicates quadruped robot modeling and control
Reinforcement learning struggles with local optima in large solution spaces
Proposing kinematic model integration to guide RL and mitigate local optima
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates kinematic model into RL framework
Guides RL exploration with body posture feedback
Deployed on quadruped robot with manipulator
🔎 Similar Papers
No similar papers found.
D
Dianyong Hou
Institute of Cyber-Systems and Control, Zhejiang University, China
C
Chengrui Zhu
Institute of Cyber-Systems and Control, Zhejiang University, China
Z
Zhen Zhang
Institute of Cyber-Systems and Control, Zhejiang University, China
Zhibin Li
Zhibin Li
Professor in School of Transportation, Southeast University
Intelligent Transportation SystemTraffic ControlTraffic SafetyTraffic FlowData Mining
C
Chuang Guo
Institute of Cyber-Systems and Control, Zhejiang University, China
Y
Yong Liu
State Key Laboratory of Industrial Control Technology of Zhejiang University, China