CordViP: Correspondence-based Visuomotor Policy for Dexterous Manipulation in Real-World

📅 2025-02-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses two key challenges in real-world dexterous manipulation: (1) low-quality single-view point clouds due to limited sensor resolution, occlusion by the dexterous hand, and suboptimal viewing angles; and (2) the absence of contact information and explicit hand-object spatial correspondence in global point cloud representations. To this end, we propose an interaction-aware point cloud representation method: (i) we introduce the first object-centric contact graph to explicitly encode physical interactions; (ii) we jointly model coordinated hand-arm dynamics; and (iii) we integrate 6D object pose estimation with proprioceptive sensing to enable end-to-end visuomotor policy learning. Evaluated on four real-world dexterous manipulation tasks, our approach achieves a mean success rate of 90%, significantly outperforming all baselines. It demonstrates strong generalization and robustness across multi-object setups, varying viewpoints, and complex scenes.

Technology Category

Application Category

📝 Abstract
Achieving human-level dexterity in robots is a key objective in the field of robotic manipulation. Recent advancements in 3D-based imitation learning have shown promising results, providing an effective pathway to achieve this goal. However, obtaining high-quality 3D representations presents two key problems: (1) the quality of point clouds captured by a single-view camera is significantly affected by factors such as camera resolution, positioning, and occlusions caused by the dexterous hand; (2) the global point clouds lack crucial contact information and spatial correspondences, which are necessary for fine-grained dexterous manipulation tasks. To eliminate these limitations, we propose CordViP, a novel framework that constructs and learns correspondences by leveraging the robust 6D pose estimation of objects and robot proprioception. Specifically, we first introduce the interaction-aware point clouds, which establish correspondences between the object and the hand. These point clouds are then used for our pre-training policy, where we also incorporate object-centric contact maps and hand-arm coordination information, effectively capturing both spatial and temporal dynamics. Our method demonstrates exceptional dexterous manipulation capabilities with an average success rate of 90% in four real-world tasks, surpassing other baselines by a large margin. Experimental results also highlight the superior generalization and robustness of CordViP to different objects, viewpoints, and scenarios. Code and videos are available on https://aureleopku.github.io/CordViP.
Problem

Research questions and friction points this paper is trying to address.

Achieve human-level dexterity in robots
Resolve issues with 3D representation quality
Enhance fine-grained manipulation via correspondence learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

6D pose estimation
interaction-aware point clouds
object-centric contact maps
🔎 Similar Papers
No similar papers found.
Y
Yankai Fu
School of Computer Science, Peking University; Wuhan University
Q
Qiuxuan Feng
School of Computer Science, Peking University; Tianjin University
N
Ning Chen
School of Computer Science, Peking University
Z
Zichen Zhou
Beijing Institute of Technology
M
Mengzhen Liu
School of Computer Science, Peking University
Mingdong Wu
Mingdong Wu
Peking University
Embodied AIReinforcement LearningGenerative Model
T
Tianxing Chen
The University of Hong Kong
S
Shanyu Rong
School of Computer Science, Peking University
J
Jiaming Liu
School of Computer Science, Peking University
H
Hao Dong
School of Computer Science, Peking University
Shanghang Zhang
Shanghang Zhang
Peking University
Embodied AIFoundation Models