๐ค AI Summary
To address geometric reasoning from unlabeled RGB-D data, this paper proposes a self-supervised point cloud registration framework. The method introduces (1) a cycle-consistent keypoint selection mechanism that enforces cross-view spatial constraints via geometrically salient anchor points, and (2) a pose estimation module integrating GRU-based temporal modeling with transformation synchronization, jointly leveraging historical observations and multi-view geometric consistency. Without requiring any manual annotations, the framework achieves state-of-the-art performance among self-supervised approaches on ScanNet and 3DMatchโmatching or even surpassing certain fully supervised baselines. It further demonstrates strong generalizability: the learned representations can be readily integrated into downstream SLAM or reconstruction systems, significantly improving their robustness and accuracy.
๐ Abstract
With the rise in consumer depth cameras, a wealth of unlabeled RGB-D data has become available. This prompts the question of how to utilize this data for geometric reasoning of scenes. While many RGB-D registration meth- ods rely on geometric and feature-based similarity, we take a different approach. We use cycle-consistent keypoints as salient points to enforce spatial coherence constraints during matching, improving correspondence accuracy. Additionally, we introduce a novel pose block that combines a GRU recurrent unit with transformation synchronization, blending historical and multi-view data. Our approach surpasses previous self- supervised registration methods on ScanNet and 3DMatch, even outperforming some older supervised methods. We also integrate our components into existing methods, showing their effectiveness.