ImLPR: Image-based LiDAR Place Recognition using Vision Foundation Models

πŸ“… 2025-05-23
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
LiDAR place recognition (LPR) lacks dedicated 3D foundation models, and transferring knowledge from visual foundation models (VFMs) remains challenging due to modality mismatch. Method: This work introduces DINOv2β€”a state-of-the-art VFMβ€”into LPR for the first time. We propose a LiDAR-to-vision adaptation paradigm based on Range Image View (RIV), design a lightweight MultiConv adapter, and jointly optimize it end-to-end with a Patch-InfoNCE contrastive loss. Crucially, we demonstrate that RIV yields superior representational capacity over BEV for this task. Results: Our approach achieves new state-of-the-art performance across multiple public, multi-source LiDAR datasets, attaining top-tier cross-session Recall@1 and F1 scores. The code and pretrained models are publicly released, establishing a scalable, foundation-model-driven transfer paradigm for robot localization research.

Technology Category

Application Category

πŸ“ Abstract
LiDAR Place Recognition (LPR) is a key component in robotic localization, enabling robots to align current scans with prior maps of their environment. While Visual Place Recognition (VPR) has embraced Vision Foundation Models (VFMs) to enhance descriptor robustness, LPR has relied on task-specific models with limited use of pre-trained foundation-level knowledge. This is due to the lack of 3D foundation models and the challenges of using VFM with LiDAR point clouds. To tackle this, we introduce ImLPR, a novel pipeline that employs a pre-trained DINOv2 VFM to generate rich descriptors for LPR. To our knowledge, ImLPR is the first method to leverage a VFM to support LPR. ImLPR converts raw point clouds into Range Image Views (RIV) to leverage VFM in the LiDAR domain. It employs MultiConv adapters and Patch-InfoNCE loss for effective feature learning. We validate ImLPR using public datasets where it outperforms state-of-the-art (SOTA) methods in intra-session and inter-session LPR with top Recall@1 and F1 scores across various LiDARs. We also demonstrate that RIV outperforms Bird's-Eye-View (BEV) as a representation choice for adapting LiDAR for VFM. We release ImLPR as open source for the robotics community.
Problem

Research questions and friction points this paper is trying to address.

LPR lacks pre-trained foundation models for robust descriptors
Challenges in applying VFM to LiDAR point clouds exist
Need effective LiDAR-to-image conversion for VFM utilization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses DINOv2 VFM for LiDAR place recognition
Converts point clouds to Range Image Views
Employs MultiConv adapters and Patch-InfoNCE loss
πŸ”Ž Similar Papers
No similar papers found.