Category-Level 3D Correspondence in Camera Space via Morphable Object Priors

πŸ“… 2026-05-27
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing methods struggle to infer category-level, fine-grained 3D semantic correspondences from a single image in an unsupervised manner, limiting the understanding of object parts, functionality, and interactions. This work proposes Morpheus, a method that disentangles canonical shape, deformation, and pose to learn a deformable category-level shape prior and implicitly establishes semantically consistent 3D correspondences within a shared canonical spaceβ€”without requiring explicit supervision. We introduce HouseCorr3D, the first large-scale benchmark for 3D correspondences on household objects, featuring non-rigid occlusion annotations and symmetry labels. Experiments demonstrate that our approach achieves state-of-the-art performance on HouseCorr3D, providing the first evidence that unsupervised category-level semantic 3D understanding is feasible. The dataset and code are publicly released.
πŸ“ Abstract
Understanding 3D objects from images is fundamental to robotics and AR/VR applications. While recent work has made progress in category-level pose estimation, current representations fail to capture the fine-grained semantics needed for reasoning about object parts, functions, and interactions. In this work, we study category-level 3D correspondence in camera space -- predicting, from a single image, 3D locations that remain consistent across instances within a category -- and show that it can emerge without explicit correspondence supervision by learning a shared morphable object prior. To enable research in this direction, we introduce HouseCorr3D, the first large-scale benchmark for monocular category-level 3D correspondence with 178k images across 50 household object categories, 280 unique instances, and 3D keypoint annotations directly on CAD models. Crucially, HouseCorr3D provides amodal correspondence labels for occluded regions and explicit symmetry annotations, addressing key limitations of existing datasets. We further propose Morpheus, a method that learns morphable category-level shape priors by disentangling canonical shape, deformation, and object pose. Through this shared canonical grounding, semantically meaningful 3D correspondences in camera space emerge implicitly. These emerging 3D correspondences set a new state of the art on HouseCorr3D, demonstrating that semantic 3D object understanding can arise without direct correspondence supervision. Data and code are publicly available at https://github.com/GenIntel/HouseCorr3D.
Problem

Research questions and friction points this paper is trying to address.

category-level 3D correspondence
camera space
morphable object priors
semantic understanding
monocular 3D reconstruction
Innovation

Methods, ideas, or system contributions that make the work stand out.

morphable object priors
category-level 3D correspondence
canonical shape disentanglement
HouseCorr3D
unsupervised 3D semantics