GCE-Pose: Global Context Enhancement for Category-level Object Pose Estimation

📅 2025-02-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the weak generalization of contextual features in model-agnostic, category-level object pose estimation under partial visibility. We propose a “reconstruct-then-aggregate” paradigm that leverages category-level priors to enhance global semantic-geometric context. Our key contributions are: (1) the Semantic Shape Reconstruction (SSR) module—the first to employ a learnable linear deformation model for category-prototype-guided joint reconstruction from RGB-D inputs; and (2) the Global Context Enhancement (GCE) module, enabling cross-modal fusion of global and local features. The entire framework is end-to-end differentiable, supporting joint geometric and semantic optimization. Evaluated on HouseCat6D and NOCS-REAL275, our method achieves a 12.7% improvement in ADD-S accuracy and demonstrates significantly enhanced robustness under occlusion and truncation.

Technology Category

Application Category

📝 Abstract
A key challenge in model-free category-level pose estimation is the extraction of contextual object features that generalize across varying instances within a specific category. Recent approaches leverage foundational features to capture semantic and geometry cues from data. However, these approaches fail under partial visibility. We overcome this with a first-complete-then-aggregate strategy for feature extraction utilizing class priors. In this paper, we present GCE-Pose, a method that enhances pose estimation for novel instances by integrating category-level global context prior. GCE-Pose performs semantic shape reconstruction with a proposed Semantic Shape Reconstruction (SSR) module. Given an unseen partial RGB-D object instance, our SSR module reconstructs the instance's global geometry and semantics by deforming category-specific 3D semantic prototypes through a learned deep Linear Shape Model. We further introduce a Global Context Enhanced (GCE) feature fusion module that effectively fuses features from partial RGB-D observations and the reconstructed global context. Extensive experiments validate the impact of our global context prior and the effectiveness of the GCE fusion module, demonstrating that GCE-Pose significantly outperforms existing methods on challenging real-world datasets HouseCat6D and NOCS-REAL275. Our project page is available at https://colin-de.github.io/GCE-Pose/.
Problem

Research questions and friction points this paper is trying to address.

Enhances pose estimation with global context
Handles partial visibility in object recognition
Reconstructs global geometry from partial RGB-D data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantic Shape Reconstruction module
Global Context Enhanced fusion
Linear Shape Model deformation
🔎 Similar Papers
No similar papers found.