🤖 AI Summary
Existing explicit feature-matching-based 3D-3D correspondence methods suffer from severe performance degradation under challenging conditions—including small overlap, textureless surfaces, and occlusions.
Method: This paper proposes a structure-aware, matching-free 3D-3D correspondence learning framework. Instead of relying on handcrafted or learned feature matching, it leverages structure-guided keypoint detection and graph neural networks to explicitly model geometric relationships among keypoints, enforcing inter- and intra-image structural consistency. Crucially, it introduces, for the first time, a keypoint-driven image reconstruction loss, enabling zero-shot pose estimation for unseen objects.
Results: On CO3D, the method reduces average angular error by 5.7° over state-of-the-art methods. It further demonstrates strong generalization on Objaverse and LineMOD, confirming robustness across diverse object categories and real-world scenarios.
📝 Abstract
Relative pose estimation provides a promising way for achieving object-agnostic pose estimation. Despite the success of existing 3D correspondence-based methods, the reliance on explicit feature matching suffers from small overlaps in visible regions and unreliable feature estimation for invisible regions. Inspired by humans' ability to assemble two object parts that have small or no overlapping regions by considering object structure, we propose a novel Structure-Aware Correspondence Learning method for Relative Pose Estimation, which consists of two key modules. First, a structure-aware keypoint extraction module is designed to locate a set of kepoints that can represent the structure of objects with different shapes and appearance, under the guidance of a keypoint based image reconstruction loss. Second, a structure-aware correspondence estimation module is designed to model the intra-image and inter-image relationships between keypoints to extract structure-aware features for correspondence estimation. By jointly leveraging these two modules, the proposed method can naturally estimate 3D-3D correspondences for unseen objects without explicit feature matching for precise relative pose estimation. Experimental results on the CO3D, Objaverse and LineMOD datasets demonstrate that the proposed method significantly outperforms prior methods, i.e., with 5.7{deg}reduction in mean angular error on the CO3D dataset.