ImLPR: Image-based LiDAR Place Recognition using Vision Foundation Models

📅 2025-05-23

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

LiDAR place recognition (LPR) lacks dedicated 3D foundation models, and transferring knowledge from visual foundation models (VFMs) remains challenging due to modality mismatch. Method: This work introduces DINOv2—a state-of-the-art VFM—into LPR for the first time. We propose a LiDAR-to-vision adaptation paradigm based on Range Image View (RIV), design a lightweight MultiConv adapter, and jointly optimize it end-to-end with a Patch-InfoNCE contrastive loss. Crucially, we demonstrate that RIV yields superior representational capacity over BEV for this task. Results: Our approach achieves new state-of-the-art performance across multiple public, multi-source LiDAR datasets, attaining top-tier cross-session Recall@1 and F1 scores. The code and pretrained models are publicly released, establishing a scalable, foundation-model-driven transfer paradigm for robot localization research.

Technology Category

Application Category

📝 Abstract

LiDAR Place Recognition (LPR) is a key component in robotic localization, enabling robots to align current scans with prior maps of their environment. While Visual Place Recognition (VPR) has embraced Vision Foundation Models (VFMs) to enhance descriptor robustness, LPR has relied on task-specific models with limited use of pre-trained foundation-level knowledge. This is due to the lack of 3D foundation models and the challenges of using VFM with LiDAR point clouds. To tackle this, we introduce ImLPR, a novel pipeline that employs a pre-trained DINOv2 VFM to generate rich descriptors for LPR. To our knowledge, ImLPR is the first method to leverage a VFM to support LPR. ImLPR converts raw point clouds into Range Image Views (RIV) to leverage VFM in the LiDAR domain. It employs MultiConv adapters and Patch-InfoNCE loss for effective feature learning. We validate ImLPR using public datasets where it outperforms state-of-the-art (SOTA) methods in intra-session and inter-session LPR with top Recall@1 and F1 scores across various LiDARs. We also demonstrate that RIV outperforms Bird's-Eye-View (BEV) as a representation choice for adapting LiDAR for VFM. We release ImLPR as open source for the robotics community.

Problem

Research questions and friction points this paper is trying to address.

LPR lacks pre-trained foundation models for robust descriptors

Challenges in applying VFM to LiDAR point clouds exist

Need effective LiDAR-to-image conversion for VFM utilization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses DINOv2 VFM for LiDAR place recognition

Converts point clouds to Range Image Views

Employs MultiConv adapters and Patch-InfoNCE loss

🔎 Similar Papers

No similar papers found.