Unified Category-Level Object Detection and Pose Estimation from RGB Images using 3D Prototypes

📅 2025-08-04

📈 Citations: 0

✨ Influential: 0

career value

164K/year

🤖 AI Summary

This paper addresses category-level 6D pose estimation and detection from a single RGB image. We propose the first end-to-end unified framework that requires neither RGB-D input nor a two-stage pipeline. Methodologically, we jointly model detection and pose estimation within a single network: (i) a differentiable neural mesh representation for a 3D prototype library; (ii) feature-alignment-driven differentiable rendering; and (iii) a multi-model RANSAC optimizer enabling cross-task collaborative optimization. By eliminating reliance on depth data and avoiding error propagation from post-processing, our approach significantly improves robustness and generalization. On the REAL275 benchmark, our method achieves state-of-the-art performance, outperforming prior art by 22.9% in average scale-invariant metric (AUC), setting a new performance ceiling for category-level 6D pose estimation from monocular RGB images.

Technology Category

Application Category

📝 Abstract

Recognizing objects in images is a fundamental problem in computer vision. Although detecting objects in 2D images is common, many applications require determining their pose in 3D space. Traditional category-level methods rely on RGB-D inputs, which may not always be available, or employ two-stage approaches that use separate models and representations for detection and pose estimation. For the first time, we introduce a unified model that integrates detection and pose estimation into a single framework for RGB images by leveraging neural mesh models with learned features and multi-model RANSAC. Our approach achieves state-of-the-art results for RGB category-level pose estimation on REAL275, improving on the current state-of-the-art by 22.9% averaged across all scale-agnostic metrics. Finally, we demonstrate that our unified method exhibits greater robustness compared to single-stage baselines. Our code and models are available at https://github.com/Fischer-Tom/unified-detection-and-pose-estimation.

Problem

Research questions and friction points this paper is trying to address.

Unified detection and 3D pose estimation from RGB images

Eliminates need for RGB-D inputs or separate detection models

Improves RGB category-level pose estimation accuracy by 22.9%

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified model for detection and pose estimation

Leverages neural mesh models with learned features

Uses multi-model RANSAC for robust performance

🔎 Similar Papers

No similar papers found.