GDRNPP: A Geometry-guided and Fully Learning-based Object Pose Estimator

📅 2021-02-24
📈 Citations: 317
Influential: 76
📄 PDF
🤖 AI Summary
Conventional methods for 6D rigid pose estimation from monocular RGB images suffer from high computational overhead and lack end-to-end differentiability, hindering joint optimization. Method: This paper introduces GDRN—the first fully learning-based, geometry-guided direct regression framework—comprising a lightweight backbone and a differentiable geometric refinement module (GDRNPP). Leveraging joint RGB-D input, GDRN predicts geometry-guided coordinate maps, establishes differentiable 3D–3D correspondences, and performs end-to-end optimization, entirely eliminating hand-crafted post-processing (e.g., RANSAC/PnP). Contribution/Results: By deeply embedding geometric priors into both network architecture and loss functions, GDRN achieves high-accuracy, high-efficiency purely data-driven pose regression. It has ranked first in the BOP Challenge for two consecutive years, outperforming all hybrid methods that integrate traditional geometric optimization in both accuracy and inference speed.
📝 Abstract
6D pose estimation of rigid objects is a long-standing and challenging task in computer vision. Recently, the emergence of deep learning reveals the potential of Convolutional Neural Networks (CNNs) to predict reliable 6D poses. Given that direct pose regression networks currently exhibit suboptimal performance, most methods still resort to traditional techniques to varying degrees. For example, top-performing methods often adopt an indirect strategy by first establishing 2D-3D or 3D-3D correspondences followed by applying the RANSAC-based PnP or Kabsch algorithms, and further employing ICP for refinement. Despite the performance enhancement, the integration of traditional techniques makes the networks time-consuming and not end-to-end trainable. Orthogonal to them, this paper introduces a fully learning-based object pose estimator. In this work, we first perform an in-depth investigation of both direct and indirect methods and propose a simple yet effective Geometry-guided Direct Regression Network (GDRN) to learn the 6D pose from monocular images in an end-to-end manner. Afterwards, we introduce a geometry-guided pose refinement module, enhancing pose accuracy when extra depth data is available. Guided by the predicted coordinate map, we build an end-to-end differentiable architecture that establishes robust and accurate 3D-3D correspondences between the observed and rendered RGB-D images to refine the pose. Our enhanced pose estimation pipeline GDRNPP (GDRN Plus Plus) conquered the leaderboard of the BOP Challenge for two consecutive years, becoming the first to surpass all prior methods that relied on traditional techniques in both accuracy and speed. The code and models are available at https://github.com/shanice-l/gdrnpp_bop2022.
Problem

Research questions and friction points this paper is trying to address.

Develops a fully learning-based 6D pose estimator for rigid objects.
Introduces a Geometry-guided Direct Regression Network for end-to-end pose learning.
Enhances pose accuracy with a geometry-guided refinement module using depth data.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Geometry-guided Direct Regression Network (GDRN)
End-to-end differentiable architecture
Geometry-guided pose refinement module
🔎 Similar Papers
No similar papers found.
Gu Wang
Gu Wang
Tsinghua University
Vision in Robotics3D VisionPose Estimation
Fabian Manhardt
Fabian Manhardt
Google
Federico Tombari
Federico Tombari
Google, TU Munich
Computer VisionMachine Learning3D Perception
X
Xiangyang Ji
Department of Automation, Tsinghua University, Beijing 100084, China, and also with BNRist, Beijing 100084, China