🤖 AI Summary
This paper addresses the weak generalization of contextual features in model-agnostic, category-level object pose estimation under partial visibility. We propose a “reconstruct-then-aggregate” paradigm that leverages category-level priors to enhance global semantic-geometric context. Our key contributions are: (1) the Semantic Shape Reconstruction (SSR) module—the first to employ a learnable linear deformation model for category-prototype-guided joint reconstruction from RGB-D inputs; and (2) the Global Context Enhancement (GCE) module, enabling cross-modal fusion of global and local features. The entire framework is end-to-end differentiable, supporting joint geometric and semantic optimization. Evaluated on HouseCat6D and NOCS-REAL275, our method achieves a 12.7% improvement in ADD-S accuracy and demonstrates significantly enhanced robustness under occlusion and truncation.
📝 Abstract
A key challenge in model-free category-level pose estimation is the extraction of contextual object features that generalize across varying instances within a specific category. Recent approaches leverage foundational features to capture semantic and geometry cues from data. However, these approaches fail under partial visibility. We overcome this with a first-complete-then-aggregate strategy for feature extraction utilizing class priors. In this paper, we present GCE-Pose, a method that enhances pose estimation for novel instances by integrating category-level global context prior. GCE-Pose performs semantic shape reconstruction with a proposed Semantic Shape Reconstruction (SSR) module. Given an unseen partial RGB-D object instance, our SSR module reconstructs the instance's global geometry and semantics by deforming category-specific 3D semantic prototypes through a learned deep Linear Shape Model. We further introduce a Global Context Enhanced (GCE) feature fusion module that effectively fuses features from partial RGB-D observations and the reconstructed global context. Extensive experiments validate the impact of our global context prior and the effectiveness of the GCE fusion module, demonstrating that GCE-Pose significantly outperforms existing methods on challenging real-world datasets HouseCat6D and NOCS-REAL275. Our project page is available at https://colin-de.github.io/GCE-Pose/.