Cross-modal feature fusion for robust point cloud registration with ambiguous geometry

📅 2025-05-19

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

Point cloud registration suffers from poor robustness in geometrically ambiguous scenes (e.g., symmetric structures, planar regions). To address this, this paper proposes a cross-modal registration method that jointly leverages RGB radiometric information and 3D geometric features. The core contribution is a novel two-stage cross-modal feature fusion mechanism: first, point clouds are enhanced via pixel-level projection onto RGB images; second, coarse matching is performed by jointly reasoning over image patches and superpoints, with explicit modeling of geometric ambiguities. The method integrates joint point cloud–image encoding, superpoint graph construction, and a coarse-to-fine matching strategy. It achieves state-of-the-art performance on 3DMatch, 3DLoMatch, IndoorLRS, and ScanNet++, with registration recall rates of 95.9% on 3DMatch and 81.6% on 3DLoMatch—substantially outperforming geometry-only approaches.

Technology Category

Application Category

📝 Abstract

Point cloud registration has seen significant advancements with the application of deep learning techniques. However, existing approaches often overlook the potential of integrating radiometric information from RGB images. This limitation reduces their effectiveness in aligning point clouds pairs, especially in regions where geometric data alone is insufficient. When used effectively, radiometric information can enhance the registration process by providing context that is missing from purely geometric data. In this paper, we propose CoFF, a novel Cross-modal Feature Fusion method that utilizes both point cloud geometry and RGB images for pairwise point cloud registration. Assuming that the co-registration between point clouds and RGB images is available, CoFF explicitly addresses the challenges where geometric information alone is unclear, such as in regions with symmetric similarity or planar structures, through a two-stage fusion of 3D point cloud features and 2D image features. It incorporates a cross-modal feature fusion module that assigns pixel-wise image features to 3D input point clouds to enhance learned 3D point features, and integrates patch-wise image features with superpoint features to improve the quality of coarse matching. This is followed by a coarse-to-fine matching module that accurately establishes correspondences using the fused features. We extensively evaluate CoFF on four common datasets: 3DMatch, 3DLoMatch, IndoorLRS, and the recently released ScanNet++ datasets. In addition, we assess CoFF on specific subset datasets containing geometrically ambiguous cases. Our experimental results demonstrate that CoFF achieves state-of-the-art registration performance across all benchmarks, including remarkable registration recalls of 95.9% and 81.6% on the widely-used 3DMatch and 3DLoMatch datasets, respectively...(Truncated to fit arXiv abstract length)

Problem

Research questions and friction points this paper is trying to address.

Integrating radiometric data with geometric information for point cloud registration

Addressing ambiguous geometry in point clouds using cross-modal feature fusion

Improving registration accuracy in symmetric or planar regions via RGB fusion

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fuses 3D point cloud and 2D RGB image features

Uses cross-modal feature fusion for ambiguous geometry

Implements coarse-to-fine matching with fused features

🔎 Similar Papers

No similar papers found.