GDRNPP: A Geometry-guided and Fully Learning-based Object Pose Estimator

📅 2021-02-24

📈 Citations: 317

✨ Influential: 76

career value

172K/year

🤖 AI Summary

Conventional methods for 6D rigid pose estimation from monocular RGB images suffer from high computational overhead and lack end-to-end differentiability, hindering joint optimization. Method: This paper introduces GDRN—the first fully learning-based, geometry-guided direct regression framework—comprising a lightweight backbone and a differentiable geometric refinement module (GDRNPP). Leveraging joint RGB-D input, GDRN predicts geometry-guided coordinate maps, establishes differentiable 3D–3D correspondences, and performs end-to-end optimization, entirely eliminating hand-crafted post-processing (e.g., RANSAC/PnP). Contribution/Results: By deeply embedding geometric priors into both network architecture and loss functions, GDRN achieves high-accuracy, high-efficiency purely data-driven pose regression. It has ranked first in the BOP Challenge for two consecutive years, outperforming all hybrid methods that integrate traditional geometric optimization in both accuracy and inference speed.

📝 Abstract

6D pose estimation of rigid objects is a long-standing and challenging task in computer vision. Recently, the emergence of deep learning reveals the potential of Convolutional Neural Networks (CNNs) to predict reliable 6D poses. Given that direct pose regression networks currently exhibit suboptimal performance, most methods still resort to traditional techniques to varying degrees. For example, top-performing methods often adopt an indirect strategy by first establishing 2D-3D or 3D-3D correspondences followed by applying the RANSAC-based PnP or Kabsch algorithms, and further employing ICP for refinement. Despite the performance enhancement, the integration of traditional techniques makes the networks time-consuming and not end-to-end trainable. Orthogonal to them, this paper introduces a fully learning-based object pose estimator. In this work, we first perform an in-depth investigation of both direct and indirect methods and propose a simple yet effective Geometry-guided Direct Regression Network (GDRN) to learn the 6D pose from monocular images in an end-to-end manner. Afterwards, we introduce a geometry-guided pose refinement module, enhancing pose accuracy when extra depth data is available. Guided by the predicted coordinate map, we build an end-to-end differentiable architecture that establishes robust and accurate 3D-3D correspondences between the observed and rendered RGB-D images to refine the pose. Our enhanced pose estimation pipeline GDRNPP (GDRN Plus Plus) conquered the leaderboard of the BOP Challenge for two consecutive years, becoming the first to surpass all prior methods that relied on traditional techniques in both accuracy and speed. The code and models are available at https://github.com/shanice-l/gdrnpp_bop2022.

Problem

Research questions and friction points this paper is trying to address.

Develops a fully learning-based 6D pose estimator for rigid objects.

Introduces a Geometry-guided Direct Regression Network for end-to-end pose learning.

Enhances pose accuracy with a geometry-guided refinement module using depth data.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Geometry-guided Direct Regression Network (GDRN)

End-to-end differentiable architecture

Geometry-guided pose refinement module

🔎 Similar Papers

No similar papers found.

World Labs

$250,000-$350,000 base salary (good-faith estimate for San Francisco Bay Area upon hire; actual offer based on experience, skills, and qualifications)

San Francisco / San Francisco Office, San Francisco, California, United States

Master Thesis AI-Based Keypoint Refinement for Autonomous Driving

Bosch Group

Hildesheim, NDS, DE

Authors to Follow