PRAM: Place Recognition Anywhere Model for Efficient Visual Localization

📅 2024-04-11

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

245K/year

🤖 AI Summary

To address the trade-off between efficiency and accuracy in visual localization on edge devices, this paper proposes PRAM: a framework that leverages self-supervised 3D landmark generation—requiring no semantic labels—to replace redundant global or local descriptors with sparse, geometrically meaningful keypoints. These keypoints drive a lightweight Transformer for landmark identification and 2D–3D matching, integrated with robust outlier rejection. Its core innovation is the landmark-centric paradigm: abandoning pixel-level or scene-level matching in favor of geometry-consistent, sparse landmark-driven correspondence. Experiments across large-scale indoor and outdoor scenes show PRAM matches hierarchical methods in accuracy while significantly outperforming APR and SCR. It reduces memory footprint by over 90% and accelerates inference by 2.4×, achieving—for the first time on edge devices—an optimal balance between high accuracy and high efficiency.

Technology Category

Application Category

📝 Abstract

Visual localization is a key technique to a variety of applications, e.g., autonomous driving, AR/VR, and robotics. For these real applications, both efficiency and accuracy are important especially on edge devices with limited computing resources. However, previous frameworks, e.g., absolute pose regression (APR), scene coordinate regression (SCR), and the hierarchical method (HM), have limited either accuracy or efficiency in both indoor and outdoor environments. In this paper, we propose the place recognition anywhere model (PRAM), a new framework, to perform visual localization efficiently and accurately by recognizing 3D landmarks. Specifically, PRAM first generates landmarks directly in 3D space in a self-supervised manner. Without relying on commonly used classic semantic labels, these 3D landmarks can be defined in any place in indoor and outdoor scenes with higher generalization ability. Representing the map with 3D landmarks, PRAM discards global descriptors, repetitive local descriptors, and redundant 3D points, increasing the memory efficiency significantly. Then, sparse keypoints, rather than dense pixels, are utilized as the input tokens to a transformer-based recognition module for landmark recognition, which enables PRAM to recognize hundreds of landmarks with high time and memory efficiency. At test time, sparse keypoints and predicted landmark labels are utilized for outlier removal and landmark-wise 2D-3D matching as opposed to exhaustive 2D-2D matching, which further increases the time efficiency. A comprehensive evaluation of APRs, SCRs, HMs, and PRAM on both indoor and outdoor datasets demonstrates that PRAM outperforms ARPs and SCRs in large-scale scenes with a large margin and gives competitive accuracy to HMs but reduces over 90% memory cost and runs 2.4 times faster, leading to a better balance between efficiency and accuracy.

Problem

Research questions and friction points this paper is trying to address.

Improves visual localization efficiency and accuracy on edge devices.

Introduces PRAM for 3D landmark recognition in diverse environments.

Reduces memory cost and increases speed compared to existing methods.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised 3D landmark generation for generalization.

Transformer-based recognition using sparse keypoints for efficiency.

Outlier removal via landmark-wise 2D-3D matching for accuracy.

🔎 Similar Papers

No similar papers found.