🤖 AI Summary
This work addresses the longstanding trade-off in visual localization between 2D representations, which are lightweight and easy to maintain but lack strong geometric reasoning, and 3D representations, which offer high accuracy but are difficult to update and scale. To bridge this gap, the authors propose an enhanced 2D image representation that incorporates estimated depth maps to inject geometric structure while preserving the efficiency and scalability of 2D systems. By integrating dense feature matching, compact map compression, and a GPU-accelerated LO-RANSAC algorithm, the method achieves state-of-the-art accuracy across multiple standard benchmarks. It significantly outperforms existing memory-efficient approaches and establishes a new balance between precision and computational efficiency at comparable map sizes.
📝 Abstract
Existing visual localization methods are typically either 2D image-based, which are easy to build and maintain but limited in effective geometric reasoning, or 3D structure-based, which achieve high accuracy but require a centralized reconstruction and are difficult to update. In this work, we revisit visual localization with a 2D image-based representation and propose to augment each image with estimated depth maps to capture the geometric structure. Supported by the effective use of dense matchers, this representation is not only easy to build and maintain, but achieves highest accuracy in challenging conditions. With compact compression and a GPU-accelerated LO-RANSAC implementation, the whole pipeline is efficient in both storage and computation and allows for a flexible trade-off between accuracy and highest memory efficiency. Our method achieves a new state-of-the-art accuracy on various standard benchmarks and outperforms existing memory-efficient methods at comparable map sizes. Code will be available at https://github.com/cvg/Hierarchical-Localization.