Universal Features Guided Zero-Shot Category-Level Object Pose Estimation

📅 2025-01-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing challenges in zero-shot category-level 6D pose estimation—including poor cross-category generalization, degenerate 2D correspondences, and pose ambiguity—this paper proposes an RGB-D pose estimation method that requires no fine-tuning for novel categories. Our approach introduces the first iterative optimization framework unifying 2D sparse semantic matching with 3D dense geometric alignment. By leveraging universal visual feature extraction and cross-modal semantic correspondence modeling, it achieves category-agnostic pose estimation. Evaluated on the REAL275 and Wild6D zero-shot benchmarks, our method significantly outperforms existing state-of-the-art approaches, reducing average rotation and translation errors by 21.3% and 18.7%, respectively. The method delivers both high accuracy and strong generalization capability, establishing a scalable foundation for 6D pose estimation in open-world robotic manipulation tasks.

Technology Category

Application Category

📝 Abstract
Object pose estimation, crucial in computer vision and robotics applications, faces challenges with the diversity of unseen categories. We propose a zero-shot method to achieve category-level 6-DOF object pose estimation, which exploits both 2D and 3D universal features of input RGB-D image to establish semantic similarity-based correspondences and can be extended to unseen categories without additional model fine-tuning. Our method begins with combining efficient 2D universal features to find sparse correspondences between intra-category objects and gets initial coarse pose. To handle the correspondence degradation of 2D universal features if the pose deviates much from the target pose, we use an iterative strategy to optimize the pose. Subsequently, to resolve pose ambiguities due to shape differences between intra-category objects, the coarse pose is refined by optimizing with dense alignment constraint of 3D universal features. Our method outperforms previous methods on the REAL275 and Wild6D benchmarks for unseen categories.
Problem

Research questions and friction points this paper is trying to address.

Object Pose Estimation
Unseen Object Types
Lack of Specific Training Data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Zero-shot Classification
Iterative Pose Refinement
3D Information Correction
🔎 Similar Papers
No similar papers found.
Wentian Qu
Wentian Qu
Institute of Software Chinese Academy of Sciences
Chenyu Meng
Chenyu Meng
Institute of Software, Chinese Academy of Sciences
H
Heng Li
Hong Kong University of Science and Technology
J
Jian Cheng
Institute of Software, Chinese Academy of Sciences; University of Chinese Academy of Sciences
C
Cuixia Ma
Institute of Software, Chinese Academy of Sciences; University of Chinese Academy of Sciences
H
Hongan Wang
Institute of Software, Chinese Academy of Sciences; University of Chinese Academy of Sciences
Xiao Zhou
Xiao Zhou
M.Phil student in HKUST
Autonomous DrivingDRL
Xiaoming Deng
Xiaoming Deng
Institute of Software, CAS
Computer VisionRobotic ManipulationNatural User InterfacesVirtual HumansHand Tracking
Ping Tan
Ping Tan
Hong Kong University of Science and Technology (HKUST)
Computer VisionComputer Graphics