🤖 AI Summary
Existing point cloud descriptors suffer from limited local receptive fields, hindering simultaneous achievement of feature distinctiveness and long-range contextual modeling. To address this, we propose a two-stage Transformer architecture based on local attention. First, locality-sensitive hashing (LSH) enables linear-complexity, non-overlapping window partitioning; within each window, a grouped Transformer captures long-range dependencies. Second, an interaction Transformer explicitly enhances cross-window feature interactions across overlapping regions between point cloud pairs. This design incorporates inductive biases to rationally expand the receptive field while preserving computational efficiency and significantly improving feature discriminability. Our method achieves state-of-the-art registration accuracy on multiple real-world indoor and outdoor benchmarks, demonstrating its effectiveness in learning robust, highly discriminative point cloud descriptors.
📝 Abstract
Most existing learning-based point cloud descriptors for point cloud registration focus on perceiving local information of point clouds to generate distinctive features. However, a reasonable and broader receptive field is essential for enhancing feature distinctiveness. In this paper, we propose a Local Attentive Hashing Network for point cloud registration, called LAHNet, which introduces a local attention mechanism with the inductive bias of locality of convolution-like operators into point cloud descriptors. Specifically, a Group Transformer is designed to capture reasonable long-range context between points. This employs a linear neighborhood search strategy, Locality-Sensitive Hashing, enabling uniformly partitioning point clouds into non-overlapping windows. Meanwhile, an efficient cross-window strategy is adopted to further expand the reasonable feature receptive field. Furthermore, building on this effective windowing strategy, we propose an Interaction Transformer to enhance the feature interactions of the overlap regions within point cloud pairs. This computes an overlap matrix to match overlap regions between point cloud pairs by representing each window as a global signal. Extensive results demonstrate that LAHNet can learn robust and distinctive features, achieving significant registration results on real-world indoor and outdoor benchmarks.