LiDAR Registration with Visual Foundation Models

📅 2025-02-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
LiDAR registration faces significant challenges—including fragile correspondence establishment and poor generalization—under cross-temporal, cross-seasonal, and structural point cloud variations. To address this, we propose a zero-shot multimodal registration method that eliminates the need for retraining: leveraging surround-view camera images, we extract robust visual descriptors using the DINOv2 foundation model and integrate them into a conventional RANSAC + ICP pipeline to achieve 6DoF LiDAR-to-map alignment. This work marks the first successful transfer of vision foundation model features to LiDAR registration, decoupling performance from point cloud density and geometric structure. Evaluated on the NCLT and Oxford RobotCar datasets, our method achieves absolute improvements of 24.8% and 17.3% in registration recall, respectively, substantially outperforming state-of-the-art approaches. To foster reproducibility and further research, we publicly release both the benchmark and implementation code.

Technology Category

Application Category

📝 Abstract
LiDAR registration is a fundamental task in robotic mapping and localization. A critical component of aligning two point clouds is identifying robust point correspondences using point descriptors. This step becomes particularly challenging in scenarios involving domain shifts, seasonal changes, and variations in point cloud structures. These factors substantially impact both handcrafted and learning-based approaches. In this paper, we address these problems by proposing to use DINOv2 features, obtained from surround-view images, as point descriptors. We demonstrate that coupling these descriptors with traditional registration algorithms, such as RANSAC or ICP, facilitates robust 6DoF alignment of LiDAR scans with 3D maps, even when the map was recorded more than a year before. Although conceptually straightforward, our method substantially outperforms more complex baseline techniques. In contrast to previous learning-based point descriptors, our method does not require domain-specific retraining and is agnostic to the point cloud structure, effectively handling both sparse LiDAR scans and dense 3D maps. We show that leveraging the additional camera data enables our method to outperform the best baseline by +24.8 and +17.3 registration recall on the NCLT and Oxford RobotCar datasets. We publicly release the registration benchmark and the code of our work on https://vfm-registration.cs.uni-freiburg.de.
Problem

Research questions and friction points this paper is trying to address.

Enhance LiDAR registration robustness using visual features.
Address domain shifts and seasonal changes in point clouds.
Improve alignment without domain-specific retraining.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses DINOv2 for point descriptors
Integrates RANSAC and ICP algorithms
Handles sparse and dense point clouds
🔎 Similar Papers
No similar papers found.