🤖 AI Summary
In image-to-point-cloud registration without detection, performance degradation arises from inter-modal channel attention discrepancies and spurious correspondences induced by structural scene similarity. To address these challenges, this paper proposes an end-to-end robust registration framework. Its core contributions are: (1) a Channel-Adaptive Adjustment (CAA) module that dynamically calibrates cross-modal feature channel sensitivity; and (2) a Global Optimal Selection (GOS) module that replaces local nearest-neighbor matching with global optimization to suppress erroneous correspondences. The framework integrates a coarse-to-fine strategy, channel-attention-guided matching, and feature decoupling for enhanced representation learning—specifically tailored to heterogeneous RGB-D images and sparse point clouds. Evaluated on RGB-D Scenes V2 and 7-Scenes, the method achieves state-of-the-art performance, significantly improving both dense pixel-to-point correspondence accuracy and registration stability.
📝 Abstract
Detection-free methods typically follow a coarse-to-fine pipeline, extracting image and point cloud features for patch-level matching and refining dense pixel-to-point correspondences. However, differences in feature channel attention between images and point clouds may lead to degraded matching results, ultimately impairing registration accuracy. Furthermore, similar structures in the scene could lead to redundant correspondences in cross-modal matching. To address these issues, we propose Channel Adaptive Adjustment Module (CAA) and Global Optimal Selection Module (GOS). CAA enhances intra-modal features and suppresses cross-modal sensitivity, while GOS replaces local selection with global optimization. Experiments on RGB-D Scenes V2 and 7-Scenes demonstrate the superiority of our method, achieving state-of-the-art performance in image-to-point cloud registration.